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Preface 


Nowadays, industrial organizations are heavily investing in the digital transfor- 
mation of their production processes as part of their transition to the fourth 
industrial revolution (Industry4.0). Based on Cyber Physical Systems (CPS) and 
backbone technologies like cloud computing, Industrial Internet of Things (IoT) 
and Artificial Intelligence (AI), Industry4.0 is contributing towards the realization 
of flexible production lines, while supporting innovative functionalities like mass 
customization, predictive maintenance, Zero Defect Manufacturing (ZDM), and 
digital twins. AI is currently the most disruptive digital enabler of the Industry4.0 
era. It facilitates novel use cases like predictive quality management (Quality 4.0), 
effective human-robot interaction and collaboration, agile production and gener- 
ative product design. ATs disruptive potential is propelled by advances in hard- 
ware and scalable software systems, which have allowed the efficient utilization of 
advanced machine learning frameworks and novel algorithms that are suitable for 
large scale problems in realistic settings. Thus, advanced AI technologies tackle chal- 
lenges ranging from large scale optimization to control problems in manufacturing 
environments. 

Despite these advances, state of the art AI deployments in manufacturing do not 
take full advantage of the latest innovative capabilities of machine and deep learn- 
ing, as well as of robotic systems. Rather, they are sophisticated to a limited extent 
and are mostly focused on the consolidation of datasets from heterogeneous sources 
towards enabling advanced analytics (e.g., deep learning) for use cases such as pre- 
dictive maintenance and industrial simulations. Real-life manufacturing environ- 
ments are complex, dynamic and unpredictable, which highlights safety, reliability 
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and trustworthiness challenges for the respective AI deployments. Specifically, real- 
life deployments of advanced AI systems face challenges in the following areas: 


(A) Transparency and Explainability: Nowadays data scientists can understand 
the algorithms used to train AI systems. However, they are in most cases unable 
to reason about their functionality and to explain their operation. As a result, the 
operation of AI systems is usually not transparent and therefore human users cannot 
fully understand their behaviour. Likewise, manufacturing employees are reluctant 
to trust AI solutions and accept their deployment in the shopfloor. 


(B) AI systems Interaction with the Manufacturing Environment: The suc- 
cessful deployment of AI systems in the manufacturing shopfloor requires their 
effective interaction with the surrounding environment, including cyber and phys- 
ical systems. Such interactions are fundamental for the success of certain machine 
learning models, such as Reinforcement Learning (RL). Nevertheless, the inter- 
action between AI systems and other applications is still typically slow (e.g., non 
real-time), hazardous and risky (e.g., prone to mistakes that could cause physical 
damage). This is a serious barrier for deploying AI systems at scale i.e., systems that 
involve many interactions between AI systems and other elements of the surround- 
ing environment (e.g., software and physical systems). 


(C) Human Centric AI Systems: Despite the expanded use of AI in factories, 
employees remain the most flexible resource. In the years to come, employees, AI 
solutions and robots will co-exist in the manufacturing environment. Thus, AI 
systems must be human centric i.e., able to consider the context of the employee 
and dynamically adapt to it. Likewise, employees must be properly trained to use 
and co-exist with AI solutions in the workplace. This human centred operation of 
AI systems can be very challenging and is not adequately addressed by state-of-the- 
art digital manufacturing platforms. This is also one of the reasons why many AI 
systems operate in isolation from humans. 


(D) AI Cybersecurity Challenges: The deployment of AI systems in the shopfloor 
raises significant security challenges. For example, it makes it possible for attack- 
ers to compromise the operation of a deep neural network either through taking 
control over the system or through altering its input data (i.e. poisoning) in a way 
that outputs malicious decisions. As another example, an adversary can attack an 
AI system towards accessing confidential data or proprietary learning models that 
could lead to IP (Intellectual Property) theft. 


(E) Inaccuracy and Unreliability of Industrial Data: The quality and the quan- 
tity of the data that are used for building AI systems are a decisive factor for their 
proper functioning. For example, limited data quality can be a source of poorly 
performing or biased AI systems. Unfortunately, industrial data are inherently 
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unreliable, as readings from CPS systems and IoT devices can be skewed by high 
temperatures, human errors, hardware malfunctions or even cyber-attacks. There- 
fore, data reliability is a major challenge when building AI systems for use cases like 
quality management, agile production operations and human robot collaboration. 

Overall, the successful deployment of AI solutions in manufacturing environ- 
ments hinges on their security, safety and reliability which becomes more challeng- 
ing in settings where multiple AI systems (e.g., industrial robots, robotic cells, Deep 
Neural Networks (DNNs)) interact with humans. The safe, reliable and trustwor- 
thy operation of AI systems at scale is a key perquisite for establishing confidence 
in their behaviour and operation. To guarantee the safe and reliable operation of 
AI systems in the shopfloor, there is a need to address many challenges in the scope 
of complex, heterogeneous, dynamic and unpredictable environments. Specifically, 
data reliability, human machine interaction, security, transparency and explainabil- 
ity challenges need to be addressed at the same time. Recent advances in AI research 
(e.g., in deep neural networks security and explainable AI (XAT) systems), coupled 
with novel research outcomes in the formal specification and verification of AI sys- 
tems provide a sound basis for safe and reliable AI deployments in production lines. 
However, the legal and regulatory dimension of safe and reliable AI solutions in 
production lines must be considered as well. Hence, the development of technical 
solutions for the robust, secure and safe operation of AI systems in manufacturing, 
along with the study of the legal implications of safe and secure AI in production 
lines are key prerequisites towards an ethical AI in manufacturing as illustrated in 
the guidelines of EU’s High Level Expert Group (HLEG) on AI and reflected in 
the emerging EU regulation for AI. 

To address some of the above listed challenges, fifteen European Organizations 
collaborate in the scope of the STAR project, a research initiative funded by the 
European Commission in the scope of its H2020 program (Grant Agreement Num- 
ber: 956573). Specifically, STAR is a joint effort of AI and digital manufacturing 
experts towards enabling the deployment of standard-based secure, safe reliable and 
trusted human centric AI systems in real-life manufacturing environments. STAR 
researches, develops, and validates novel technologies that enable AI systems to 
acquire knowledge in order to take timely and safe decisions in dynamic and unpre- 
dictable environments. Moreover, the project researches and will deliver approaches 
that enable AI systems to confront sophisticated adversaries and to remain robust 
against security attacks. In this way STAR’s solutions eliminate security and safety 
barriers that hinder the deployment of sophisticated AI systems in real-life produc- 
tion lines. STAR will produce technical solutions that boost the safety, robustness 
and trustworthiness of systems AI in dynamic, real-life settings, while at the same 
exploring the legal implications of a safe and secure AI in prominent manufacturing 
scenarios. 
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This book is co-authored by the STAR consortium partners and aims at pro- 
viding a complete and comprehensive review of technologies, techniques and sys- 
tems for trusted, ethical, and secure AI in manufacturing. The different chapters of 
the book cover systems and technologies for industrial data reliability, responsible 
and transparent artificial intelligence systems, human centered manufacturing sys- 
tems such as human centred digital twins, cyber-defence in AI systems, simulated 
reality systems, human robot collaboration systems, as well as automated mobile 
robots for manufacturing environments. A variety of cutting-edge AI technolo- 
gies are employed by these systems including deep neural networks, reinforcement 
learning systems, and explainable artificial intelligence systems. Furthermore, rele- 
vant standards and applicable regulations are discussed. 

Beyond reviewing state of the art standards and technologies, the book illustrates 
how the STAR research goes beyond the state of the art, towards enabling and 
showcasing human centred technologies in production lines. Emphasis is put on 
dynamic human in the loop scenarios, where ethical, transparent and trusted AI 
systems co-exist with human workers. 


The book consists of 11 Chapters: 


e Chapter 1 (“Blockchain Based Data Provenance for Trusted Artificial 
Intelligence”) deals with blockchain based solutions for industrial data relia- 
bility. It presents the advantages of blockchain technologies for tracking and 
tracing industrial data. It also reviews different blockchain solutions for digi- 
tal manufacturing, including data provenance and reliability solutions. Addi- 
tionally, the chapter outlines a solution for tracing data and metadata of Al 
algorithms for industrial applications. 

e Chapter 2 (“Artificial Intelligence and Secure Manufacturing: Filling 
Gaps in Making Industrial Environments Safer”) presents cybersecurity 
solutions for AI systems in industrial settings. Specifically, it analyses the secu- 
rity challenges of AI solutions for smart manufacturing environments. The 
analysis focuses on the adversarial models utilized by malevolent entities to 
cause malfunctions to Al-powered systems. Moreover, the chapter presents 
state-of-the-art approaches for securing machine-learning models, includ- 
ing deep neural networks. Emphasis is put on attestation-based provenance 
mechanisms that guarantee the trustworthiness of data streams feeding AI 
systems. Likewise, robust solutions that mitigate adversarial machine learn- 
ing attacks are also introduced. 

e Chapter 3 (“Knowledge Modelling and Active Learning in Manufactur- 
ing”) is devoted to active learning solutions for manufacturing environments. 
It illustrates how the use of active learning techniques for identifying the 
most informative data instances for which to obtain users’ feedback, reduce 
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friction, and maximize knowledge acquisition. Moreover, the chapter presents 
the merits of combining semantic technologies and active learning in manu- 
facturing use cases. 

Chapter 4 (“Multimodal Human Machine Interactions in Industrial 
Environments”) reviews Human Machine Interaction (HMI) techniques for 
industrial applications, with emphasis on multimodal interactions between 
industrial machines and robots. Furthermore, the chapter provides examples 
and use cases in fields related to multimodal interaction in manufacturing, 
such as augmented reality. The chapter concludes by discussing the deploy- 
ment and use of AI and multimodal HMI in the context of the various appli- 
cations in production lines. 

Chapter 5 (“A Review of Explainable Artificial Intelligence in Manu- 
facturing”) provides an overview of Explainable Artificial Intelligence (XAI) 
techniques in manufacturing applications. It presents how XAI can boost the 
transparency of AI models and analyses different metrics that can used to eval- 
uate XAI techniques. Moreover, the chapter illustrates practical applications 
of XAI techniques in the manufacturing domain. 

Chapter 6 (“Confidence Assessment of AI Models in Simulated Indus- 
trial Environments”) discusses the importance of artificially generated 
adversarial scenarios for assessing an AI agent’s confidence level and quality. 
It also presents techniques that aim to increase the confidence assessment of 
manufacturing focused AI agents, including techniques that span the fields 
of Reinforcement Learning, Explainable AI and Visual Analytics. 

Chapter 7 (“The Human-Digital Twin in the Manufacturing Industry: 
Current Perspectives and a Glimpse of Future”) explains why and how 
manufacturing workers must nowadays interact with complex production 
systems under challenging conditions. Accordingly, it illustrates how human 
centric Digital Twins can alleviate these challenges. The chapter introduces 
an anatomy of human centred digital twins that can represent humans in the 
digital world, including their intents, behaviours, conditions, and emotions. 
It also explains how such digital twins provide the ground for human-aware 
operations and planning. 

Chapter 8 (“Video Analytics for Situation Awareness Safe Robot-Human 
Cohabitation in Production Lines”) focuses on solutions for dynamically 
detecting safety zones in human robot collaboration scenarios. Specifically, it 
presents algorithms that analyse scenes from the plant using the global point 
of view of a camera network deployed in the factory. Video analytics are used 
to detect anomalies and to raise alarms in a timely fashion. Emphasis is put on 
the presentation of techniques that detect objects of interest in video streams 
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and localize them in the 3D environment. The purpose of these video ana- 
lytics is to feed a “planner” indicating dynamically the areas that should be 
avoided by a robots’ fleet operating in the production lines. 

e Chapter 9 (“Human in the Loop of AI Systems in Manufacturing”) 
reviews systems and technologies that empowers humans and AI actors to 
work in synergy. Moreover, the chapter considers the potential emergent out- 
comes of such a synergy in a way that goes beyond automation or augmenta- 
tion. A model of human-Al interaction is presented, along with techniques 
for increasing the efficiency of human-AI collaboration. 

e Chapter 10 (“A Review of Industrial Standards for AI in Manufactur- 
ing”) provides a review of industrial standards related to AI solutions in man- 
ufacturing environments, including: (i) Recommendations for human centric 
manufacturing systems; and (ii) Technical standards for safety, security and 
data management. 

e Chapter 11 (“AI That Works”) covers Al-related organizational and man- 
agement issues, beyond AI technologies. It presents notions and guidance to 
make AI work rather than just function. The chapter promotes an “AI that 
works by design” disciplines that prepares AI to work by design with embed- 
ded non-functionals for cases when things may go wrong and other risks it 
may encounter or cause. 


The book is made available as an open access publication, which could make it 
broadly and freely available to the AI and smart manufacturing communities. We 
would like to thank “now publishers” for the opportunity and their collaboration in 
making this happen. Most importantly, we take the chance to thank all contributing 
authors for their valuable inputs and collaboration. Finally, we would also like to 
acknowledge funding and support from the European Commission as part of the 
H2020 STAR project, which made this open access publication possible. 


June 2021 
John Soldatos 
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Chapter 1 


Blockchain Based Data Provenance 
for Trusted Artificial Intelligence 


By John Soldatos, Angela-Maria Despotopoulou, 
Nikos Kefalakis and Babis Ipektsidis 


Data reliability is a prerequisite for the development of effective and trusted Artifi- 
cial Intelligence (AI) systems in industrial environments. Unfortunately, industrial 
data tend to be unreliable for a variety of reasons (e.g., environmental influence, 
background noise, and sensor failures). This chapter presents the advantages of 
blockchain technologies for tracking, tracing, and boosting the reliability of indus- 
trial data. It also reviews different blockchain solutions for digital manufacturing, 
including data provenance and reliability solutions. The chapter ends-up presenting 
a complete solution for tracing data and metadata of AI algorithms for industrial 
applications. The solution ensures the use of “sealed” AI algorithms leveraging the 
properties that render blockchains resilient to tampering. Its main function is to 
persist the metadata of the algorithms, as well as their outcomes (e.g., prediction 
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and classification outcomes). As such, it facilitates the implementation strategies 
that secure AI systems in production lines and boost a trusted Al environment in 
manufacturing applications. 


1.1 Introduction 


During the past decade industrial organizations have accelerated their digital trans- 
formation and their transition to the fourth industrial revolution (Industry 4.0). 
This transition is empowered by the deployment of a proliferating number of inter- 
net connected systems (e.g., embedded sensors and other Internet of Things (IoT) 
devices) and of Cyber Physical Production Systems (CPPS) in the manufacturing 
shopfloor. IoT devices and CPPS systems enable the collection of digital data from 
physical production processes [1]. Likewise, the analysis of these data drives several 
optimizations in areas like logistics, quality management, assets maintenance, pro- 
cess control and supply chain management. These optimizations are implemented 
based on advanced analytics technologies, including BigData analytics, Machine 
Learning (ML) and Artificial Intelligence (AI). The outcomes of such analytics are 
used. to optimize production processes and to improve manufacturing decisions. 

In recent years, Artificial Intelligence (Al) is considered the most disruptive 
and impactful digital enabler of Industry 4.0. AI systems and algorithms are 
nowadays enabling manufacturing use cases with a proven Return on Invest- 
ment (ROI), such as predictive maintenance [2], predictive quality manage- 
ment (Quality 4.0) [3], zero-defect manufacturing [4] and generative product 
design [5]. The rise of Al in manufacturing use cases is driven by the explo- 
sion of available data points, the accelerated increase of available computational 
capacity, as well as advances in AI software frameworks and tools. Data avail- 
ability is a key prerequisite for the development of effective Al systems in 
manufacturing, which is the main reason why most national strategies for Al 
consider the transition to a data economy as a critical success factor for the 
Al era [6]. 

Unfortunately, industrial enterprises still struggle to collect well-structured and 
high-quality datasets. Industrial data tend to be fragmented across different “siloed” 
systems such as CPPS systems, business information systems (e.g., ERP (Enter- 
prise Resource Planning) system), asset maintenance systems (e.g., Computer- 
ized Maintenance Management System (CMMS)), historian databases, automa- 
tion systems (e.g., Supervisory Control and Data Acquisition (SCADA) systems) 
and more. These systems comprise structured (e.g., sensor data), semi-structured 
(e.g., operations records) and unstructured data (e.g., product images), while hav- 
ing different semantics, formats, and representations. Likewise, some of the data 
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have high velocity (e.g., sensor data), while others are simply high-volume data at 
rest (e.g., transactional data). The above-listed factors make their integration and 
utilization in Al use cases very challenging. To alleviate these interoperability and 
integration challenges, several interoperability and BigData management technolo- 
gies have been developed, including technologies based on industrial standards for 
data interoperability. Nevertheless, even with interoperability solutions at hand, 
industrial organizations must also confront one more challenge, namely the unre- 
liability of industrial data. 

Data reliability is one of the most important issues in industrial environments, 
especially when it comes to developing and deploying AI systems. Without reliable 
data, it is almost impossible to develop effective and high-performance AI algo- 
rithms due to the well-known GIGO (Garbage In Garbage Out) phenomenon [7]. 
Furthermore, the use of unreliable data for training AI algorithms is likely to lead 
to biased and unreliable systems. There are many reasons why data reliability is very 
challenging in industrial environments. Specifically, industrial data collection is an 
inherently unreliable process because of: 


e Environmental influences like high or low temperatures, humidity, moisture, 
and air pressure factors. 

e Background noise such as noise pollution or interference (e.g., alarms, extra- 
neous speech) and electrical noise from devices like motors, cooling devices, 
air conditioning, and power supplies. 

e Faulty or inaccurate sensors i.e., sensing systems with poor precision. 

e Dying battery of a system that compromises its ability to operate properly 
and provide reliable measurements. 

e Compromised or attacked devices that produce biased or fake data due to 
adversarial attacks (e.g., data modification, false information injection). 

e Compromised Al or BigData analytics algorithms, such as algorithms under 
poisoning or evasion attacks [8]. 


To alleviate data unreliability, industrial enterprises take various measures, 
including data cleansing of databases, quality control on the various data sources, 
as well as reconciliation of conflicting information towards a single version of the 
truth. In the area of data security and resilience, it is important to ensure that data 
infrastructures are cyber-resilient and cannot be tampered. In this direction, the use 
of distributed ledger technologies (most notably blockchains) is suggested in several 
research works. Blockchain technology is widely known as the underlying technol- 
ogy of blockbuster crypto-currencies (e.g., Bitcoin, Ethereum) and other crypto- 
assets. Nevertheless, it is increasingly used in other applications beyond digital 
finance, including industrial applications in sectors like manufacturing, energy, 
agriculture, and supply chain management [9]. 
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Blockchain infrastructures provide the means for decentralized data manage- 
ment, including ways for decentralized data operations and transactions, such as 
CRUD (CReate Update Delete) operations. They come with many compelling 
properties for reliable data operations including: 


e Decentralized operation — No single point of failure: Blockchain infras- 
tructures operate in a highly distributed fashion and do not rely on a trusted 
third party for the validation of data transactions. This architectural property 
makes them much more difficult to be compromised, as they have no single 
point of failure. It also ensures their around-the-clock availability. 

e Tamper-resilience: Distributed ledgers feature anti-tampering properties. 
Data written in a distributed ledger requires a next-to-impossible invest- 
ment in resources to be changed. Depending on the algorithm employed to 
achieve consensus within a blockchain network, there are no good incentives 
to attempt to tamper the state of the blockchain. This is a foundation for data 
reliability, as blockchain data cannot be changed by adversarial parties. 

e Data transparency and auditability: Transactions persisting (meta)data 
on a blockchain are transparent and accessible to all members (peers) of a 
blockchain network. As such they are auditable and open to scrutiny of other 
participants to the distributed ledger infrastructure. 

e Security: Blockchain infrastructures offer very secure integrity protection 
mechanisms, including data hashing and cryptographical linking among the 
various blocks. This boosts their tamper-proof nature and minimizes secu- 
rity risks. Likewise, it is not possible to hack a blockchain by attacking few 
of its nodes. Blockchains support consensus mechanisms (e.g., majority vot- 
ing), which require an absolute majority of nodes to agree on changes to the 
blockchain contents. In this way, they are resilient against cyber-attacks that 
could compromise one or more nodes. 


These properties make blockchain infrastructures appropriate for the implemen- 
tation of manufacturing use cases that involve decentralized control, such as decen- 
tralized automation and distributed data analytics processes [10, 11]. Furthermore, 
blockchain technologies are ideal for implementing supply chain management use 
cases, given the decentralized nature of industrial value chains and the fact that they 
cannot always operate based on trusted third parties [12]. Also, blockchains are very 
appealing when it comes to implementing data provenance functionalities, which 
are among the main pillars of data reliability in industrial environments. In this 
context, the present chapter provides the following contributions: 


e It reviews the use of blockchain technology in manufacturing, through pro- 
viding a taxonomy of the most prominent manufacturing-oriented use cases. 
In doing so, our review complements other recent reviews (e.g., [9, 13, 14]). 
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e It provides a detailed presentation of blockchain systems for data provenance 
and traceability in industrial environments. As already outlined, data trace- 
ability functionalities are key to ensuring data reliability. 

e It introduces a novel blockchain-based approach for AI algorithms and ana- 
lytics traceability in manufacturing environments. The presented approach 
goes beyond the traceability of industrial data entities to the provenance of 
Al algorithms meta, as well as of their outcomes (e.g., analytics and classifica- 
tion outcomes of machine learning techniques). As such it is well suited for 
supporting trusted Al use cases in manufacturing, which is the overall theme 
of the book and of subsequent chapters. 


To provide the above-listed contributions, the remaining of the chapter is struc- 
tured as follows: Section 1.2 following this introduction provides an overview of 
the main use cases in manufacturing. The section ends-up illustrating why data 
provenance is one of the most important use cases for manufacturing deployments. 
Section 1.3 reviews the main data provenance and traceability solutions that have 
been proposed and/or implemented in the research literature. It focused on man- 
ufacturing applications, yet some examples from other industrial sectors are also 
given. Section 1.4 introduces the Al algorithms metadata and AI analytics traceabil- 
ity system. It presents the structure of the blockchain that is used for the implemen- 
tation of the system, along with the main data and metadata tracked. Section 1.5 
is the final section of the chapter. It draws main conclusions and illustrates the 
connection of the present chapter to other parts of the book. 


1.2 Blockchain Applications in Manufacturing 


1.2.1 Overview 


Blockchain technology is one of the digital enablers of smart manufacturing. It is 
suitable for applications that require reliable, transparent and secure traceability 
of data, including: (i) Traceability of industrial data as part of data provenance 
use cases; (ii) Tracking distributed manufacturing resources towards streamlining 
production processes that involve multiple actors in the manufacturing chain; (iii) 
Secure data sharing in the scope of processes that involve exchange of digital models 
such as Additive Manufacturing (AM) use cases; (iv) Tracking and tracing industrial 
assets in support of processes like maintenance, repairs, and lifecycle assessment; 
(v) Ensuring transparency and auditability of manufacturing chains, including 
coordination of supply chain management and logistic processes; (vi) Increasing the 
cybersecurity of digital manufacturing infrastructures through eliminating single 
points of failure and enabling faster updates of IoT devices (e.g., firmware updates, 
patching); (vii) Development of smart diagnostics and self-service applications for 


6 Blockchain Based Data Provenance 
Table 1.1. Value propositions of blockchain use cases in manufacturing. 
Use Cases Type Value Propositions 
Decentralized Reduced Latency for edge computing operations close to the 
Automation field (automation, analytics) 
Trusted data sharing between cloud providers, manufacturers 
and devices 
Secure Auditability and transparency of information. 
Information Information consistency based on consensus mechanisms. 
Sharing Accelerated access to information from the best peer. 
Additive Secure storage of digital assets. 
Manufacturing Trusted and transparent information sharing for IP protection. 
Equipment Signing and sealing the interactions of equipment with data. 
Authentication Ensuring transparency and auditability in the use of assets. 
Cybersecurity No single points of failure — more difficult to hack. 


Conformity to 
SLAs, Standards 
and Regulation 


Automated update of patches and firmware updates through 
smart contracts. 


Automated lifecycle stages: discovery and negotiation, deploy- 
ment, monitoring, billing/penalty, and termination. 
Augmented clarity by univocally defining the rules and by 


recording interactions between physical and non-physical par- 
ties in a definitive manner. 

e Trust in environments where parties do not need to cultivate and 
maintain relationships of trust among them. 


machines, where the machines themselves will be able to monitor their state, diag- 
nose problems, and autonomously place service, consumables replenishment, or 
part replacement requests to the machine maintenance vendors; (viii) Monitoring 
conformity to Service Level Agreements (SLAs), Standards and Regulation by trans- 
lating their mandates into self-imposed smart contracts; (ix) Enabling equipment 
management, and more specifically identity authentication and authorization; and 
(x) Providing a framework to largely automate the subprocesses that compose a 
maintenance equipment leasing procedure, such as two-part negotiation, payment 
accomplishment and insurance agreement. An overview of relevant use cases and 
their benefits are presented in Table 1.1. 

Following paragraphs present the main blockchain use cases in manufacturing 
environments. 


1.2.2 Decentralized Manufacturing Automation: Intelligence 
Beyond Cloud-based Manufacturing 


Cloud computing is nowadays considered an integral element of most Industry 
4.0 deployments. Many manufacturing use cases integrate and analyse data on the 
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cloud to enable applications like asset management and digital twins. This cloud- 
based approach to digital automation has significant advantages stemming from 
the scalability, capacity and quality of service offered by cloud infrastructures. Nev- 
ertheless, it comes also with disadvantages such as: (i) The need to transmit data 
over a wide area network that results in high latency and is not appropriate for 
low latency use cases that involve automation operations close to the field; and (ii) 
the requirement for continuous internet connectivity, which cannot be taken for 
granted in industrial environments. 

Blockchain technology can alleviate the challenges of cloud-based manufactur- 
ing through enabling decentralized automation approaches that reduce latency and 
boost smartness. As a prominent example, in [15] and [10] the authors leverage 
blockchain technology to implement an edge computing automation paradigm, 
including distributed automated and distributed data analytics functionalities. 
As another example, in [16] a blockchain has been used to speed up the flow 
of production operations based on the coordination of information flows across 
manufacturing plans and warehouses by means of CPS systems and IoT devices. 
It falls in the scope of a broader class of systems that aim at deviating from cloud- 
based manufacturing towards supporting real-time interactions with CPPS systems. 
Interconnectivity between devices is considered important in this direction [17]. 

When compared to cloud-based automation systems, blockchains provide also 
increased trust between interacting actors, including users, CPPS systems and 
shopfloor services. Specifically, they facilitate trusted data sharing on the shop floor 
level. This is outlined in [18], which proposes two complementary blockchain- 
based infrastructures for data sharing: (i) A public blockchain network that is des- 
tined to facilitate trusted interactions between cloud providers and manufacturers 
and (ii) A private blockchain network that boosts trusted data sharing at the shop 
level, leveraging machine-level connections for data collection. 


1.2.3 Secure Information Sharing 


Blockchain infrastructures can be used to establish a trusted decentralized environ- 
ment for sharing data in a secure way. This fosters the implementation of secure 
information sharing use cases. For instance, a blockchain based system that boosts 
information sharing for Injection Mould Redesign (IMR) is described in [19]. 
The system emphasizes trusted information sharing between blockchain partici- 
pants, while at the same time optimizing the efficiency of the sharing processes 
through selecting the most appropriate (i.e., faster) peer for accessing the shared 
knowledge. 

In another trusted information sharing use case, a blockchain infrastructure 
enables the implementation and execution of smart contracts for the sharing of 
critical information [20]. The blockchain network comprises peers deployed in 
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manufacturing machines, system-on-chip platforms, and computing nodes. These 
peers enable a consortium of disparate organizations to communicate through 
a decentralized network. Trust is boosted by the application of data provenance 
mechanisms based on a proper audit trail. 


1.2.4 Additive Manufacturing 


The implementation of blockchain infrastructure for secure and trusted data 
exchange is particularly useful for AM applications [21]. The latter leverage digital 
models of the products and are constantly gaining momentum in the scope of the 
Industry 4.0 revolution. This is because they improve product design, boost shorter 
time-to-market, and increase manufacturing agility. One of the biggest challenges 
of AM is the secure exchange of data across the stakeholders involved in the pro- 
duction process, also given the fact that some of the exchanged data comprise Intel- 
lectual Property (IP) (e.g., digital models of a product) and other valuable assets. 
Blockchain technology facilitates the secure storage and exchange of digital assets, 
which is the reason why there many blockchain use cases in AM. 

In a recent research paper [22], blockchain technology has been used for IP rights 
management in the context of an AM network. It has been also exploited for mon- 
itoring printed parts through their lifecycle, while tracking process improvements. 
The solution is studied in a broader supply chain context, where the benefits of 
blockchain (i.e., security data sharing, enhanced visibility, and auditability) are 
highlighted and acknowledged. Another blockchain application for AM is pre- 
sented in [23], which emphasizes in the metal additive manufacturing process 
for components of the aircraft industry. The application is essentially a digital 
twin that supports the secure end-to-end tracing of data generated during AM 
processes. 


1.2.5 Equipment Identity Management 


Distributed ledger technologies are well-suited to provide an effective mechanism 
that enables equipment management, and more specifically identity authentication 
and authorization. Quality control requires strict supervision over which equip- 
ment has clearance to modify which subsets of data collections. What is more, the 
logs assembled during the procedure ought to be immutable. By assigning a dig- 
ital identifier to each piece of equipment (e.g., a sensor in an IoT infrastructure), 
allowing it to univocally “sign” its interactions with data, transparency and, there- 
fore, an uncontested single source of truth are formulated step by step [24, 25]. 
In practice, various blockchains and other DLTs provide the possibility of cre- 
ating unique accounts, fitted with a pair of cryptographic keys; a public one to be 
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universally authenticated and a private one to “sign” transactions. In such a config- 
uration, any interaction with data is signed with the equipment’s private key and 
can be verified by anyone who has access to the latter’s public key. This verification 
proves that the equipment had access to the private key, and therefore is likely to be 
the one associated with the public key. This also ensures that the digital signature 
has not been tampered with, as it is mathematically bound to the key it originally 
was made with. From their part, smart contracts can be employed for both handling 
authorization requests and translating authorization policies into machine-readable 
self-executing code. 

Overall, the use of DLT for equipment identify management, provides the 
following benefits: 


e Security risks related to password authentication are mitigated. For example, 
there is no possibility for the third party to use a simple/frequent password 
or to share it unintentionally. 

e Authentication of mobile devices, such as phones, tablets, and Augmented 
Reality (AR) glasses, is less prone to risk. No cookies or other retrievable 
objects remain on the device. 

e Storage and logs are immutable per DLT specifications. 

e Weak authentication protocols and human negligence do not pose a threat. 

e Security regulations are more severe and privately manageable as opposed to 
cloud repositories. 

e There is no centralized data honeypot for hackers to target. 

e There is no need for action if the external user for some reason needs to be 
un-certified in the future. 


1.2.6 Cybersecurity 


Many of the previous listed use cases come with security-related value proposi- 
tions. The latter emerge indirectly e.g., as part of securing data sharing processes. 
However, blockchain technologies can be used for strengthening the cybersecurity 
of digital manufacturing infrastructures [14]. For instance, blockchain technology 
boosts application decentralization, which eliminates single points of failure and 
boosts the distributional of computational loads across various servers. Likewise, 
distributed ledger technologies enable decentralized ways for automating the pro- 
cess of updating or patching IoT devices based on smart contracts [26]. They 
are also used to provide decentralized trust and accountability without relying on 
trusted parties [27]. Furthermore, using blockchain technologies data from IoT 
devices can be anonymized and remain private within edge nodes i.e., hardly acces- 
sible to non-authorized users. 
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1.2.7 Monitoring Conformity to Service Level Agreements 
(SLAs), Standards and Regulation 


The advantages of translating Service Level Agreements to self-imposed smart con- 
tracts are noteworthy [28]. First and foremost, this process automates their lifecycle 
stages: discovery and negotiation, deployment, monitoring, billing/penalty and ter- 
mination. Furthermore, it introduces clarity, since all rules are univocally defined, 
and transparency, since all interactions between physical and non-physical parties 
are recorded in a definitive manner. Lastly, DLT technologies are suitable for envi- 
ronments where parties do not need to cultivate and maintain relationships of trust 
among them [29]. 

In a real-world application, a “master” smart contract can be designed to enforce 
legal standards and agreements of any kind. By cross-examining the data uploaded 
by different stakeholders all parties can verify to what extent the process meets the 
predefined regulatory conditions. Once all the requirements are met, the regulatory 
approval may be automatically granted through a smart contract with no further 
need for on-site inspections or in-person verification. 


1.3 Blockchain-based Data Provenance and Traceability 


1.3.1 Overview 


Data provenance and traceability is one of the most prominent blockchain use cases 
in industry. It is commonly proposed in cases where several of the following issues 


hold: 


© Resilience Concerns: Centralized data provenance databases are more sus- 
ceptible to hacking and sometimes do not withstand failures. This motivates 
the use of more decentralized infrastructures. 

¢ Multiple Writers in different Trust Domains: There are multiple writers in 
the data provenance and traceability database, which may raise trust issues. 
This is particularly relevant in cases where writers belong in different admin- 
istrative domains and trust domains. 

e Lack of clear rules to Control Data input: Blockchain infrastructures are 
suitable for implementing data input rules (including validation rules) by 
means of consensus mechanisms and smart contracts [30]. 


1.3.2 Data Provenance Systems 


One of the first and most prominent blockchain-based data provenance infrastruc- 
tures is Provchain [31]. It enables auditing of data operations over cloud storage 
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in real-time, while supporting access control and intrusion detection. The infras- 
tructure leverages the tamper-proof properties of blockchain technology by main- 
taining a decentralized time-stamp log of user operations along with a blockchain 
receipt. Moreover, it supports privacy preservation features by preventing a direct 
correlation between users and provenance records, by means of a hashed user ID. 
Finally, Provchain validates provenance data entries by confirming every block using 
a blockchain receipt. 

There are also solutions that provide traceability and provenance at the level of 
entire data entities. For instance, the ProductChain blockchain [32] aims at keep- 
ing track of processes in the food supply chain. In this direction, a permissioned 
blockchain is employed along with a transaction vocabulary for the target domain. 
The system provides interfaces that enable consumers and other stakeholders to 
access food product provenance information, without disclosing information about 
trade flows. ProductChain provides very good performance (i.e., query response 
times) and is therefore suitable for a broader class of supply chain management 
applications. Beyond product data provenance and traceability in the food chain, 
there are also blockchain systems for manufacturing chains (e.g., composite mate- 
rials traceability) [33] and other agricultural products [34]. The latter are recorded 
in terms of their identity, species name, planting-time, company-name, greenhouse 
number, and geographical location. Likewise, provenance records about agricul- 
tural processes include information about identity, date & time, person, digital- 
signature, location, operation type, and company. 

As another example, the SmartProvenance system leverages smart contracts and 
consensus mechanism [35] to provide reliable data provenance assuming that the 
majority of blockchain participants operate properly. The system supports privacy 
preservation using public key encryption and digital signatures. Blockchain data 
provenance infrastructures are commonly combined with cloud infrastructures to 
provide traceability of metadata such as industrial processes configuration informa- 


tion [36]. 


1.3.3 Gaps for Trusted Al 


The above-listed provenance systems provide a sound basis for the implemen- 
tation of data traceability platforms for reliable BigData analytics in industrial 
applications. Nevertheless, they are mostly focused on tracing data entities like 
products and assets without provisions for the provenance of machine learning 
and deep learning algorithms. The latter is important for the implementation of 
consistent, reliable, and trusted Al analytics operations in manufacturing envi- 
ronments. AI/ML provenance can help detecting and mitigating cyber-attacks 
against Al systems such as poisoning. It can also boost the implementation of 
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configurable cyber-defence strategies that are grounded on auditing the trustwor- 
thiness of AI/ML algorithms training data. The traceability of AI algorithms by 
means of blockchain infrastructures and smart contracts has a dual flavour: 


e Provenance of AI algorithms metadata and configuration: Blockchain 
infrastructures can be used to boost the integrity of Al algorithms and models 
configurations such as the weights of the neurons of a deep neural network or 
the parameters of linear regression algorithms. In a trusted Al environment, 
reliable algorithms that have not been hacked must be used. 

e Provenance of AI analytics outcomes: Many manufacturing decisions are 
based on the outcomes of analytics operations. To this end, malicious parties 
may attempt to compromise the integrity and correctness of analytics out- 
comes. Blockchain infrastructures can boost the correctness and consistency 
of analytics outcomes to ensure that manufacturing operations leverage the 
actual outcomes of trusted Al algorithms i.e., that the outcomes are trusted 
as well. 


1.4 Blockchain Data Provenance for Trusted Al 
in Manufacturing 


1.4.1 Overview and Scope 


The EU funded STAR project’ researches, develops and validates technologies that 
boost trusted AI in production lines. The project studies technologies that cover 
many different Al systems in the manufacturing sector, including machine learn- 
ing and deep learning algorithms, as well as human robot collaboration. It also deals 
with the safety of these systems such as the safe movement of autonomous mobile 
robots during their operation in a production plant. The scientific and the techno- 
logical development areas of the STAR project are presented in other chapters of 
the book. 

Data reliability is an integral element of trusted AI systems in manufacturing, as 
they are needed for training and operating trusted AI systems. To ensure industrial 
data reliability, STAR develops a blockchain-based data provenance infrastruc- 
ture. The latter is destined to leverage the benefits of distributed ledger tech- 
nologies that were presented in the previous section. The STAR blockchain is 
not intended to substitute conventional battle-hardened databases. This is because 
blockchain infrastructures are not best suited for managing large volumes of data 


1. H2020 STAR (Safe and Trusted Human Centric Artificial Intelligence in Future Manufacturing Lines) 
Project, Grant Agreement Number 956573, https://star-ai.eu/ 
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and transactions. Customization and frequent changes are among blockchain’s top 
enemies. Furthermore, blockchain cannot compete with conventional databases 
and datastores in performance (e.g., in terms of latency) and responsiveness. Finally, 
blockchain infrastructures are not suitable for use cases that require reversibil- 
ity (e.g., rollback operations). Considering the above-listed limitations, the STAR 
blockchain is implemented as a data infrastructure that complements other data 
management infrastructures (e.g., databases, datastore) in persisting and manage 
provenance data (i.e., “meta-data”) about data entities like Al algorithms and their 
analytics outcomes. 

Overall, the STAR data management infrastructures exploit the best of both 
worlds: 


(i) State of the art BigData management infrastructures are used to persist large 
volumes of raw transaction data and to support data operations over them; 

(ii) A blockchain infrastructure is used to persist metadata about industrial data 
towards offering data provenance and traceability functionalities for indus- 
trial data entities, including Al algorithms and their outcomes. These meta- 
data boost the cyber-security of the systems and enable the implementation 
of security risks mitigation strategies. 


1.4.2 Performance Considerations and Blockchain Selection 


Even though the STAR blockchain is not destined to store and manage large vol- 
umes of raw data transactions, it must feature a decent performance for the prove- 
nance tasks at hand. Specifically, a blockchain for industrial metadata provenance 
should provide support for several hundreds of transactions per second, in order 
to persist the metadata and the outcomes of the AI models that are deployed and 
executed in a factory. 

Public blockchain infrastructures are usually criticized about their poor perfor- 
mance. For instance, the bitcoin protocol is one of the slowest blockchain proto- 
cols as new blocks of transactions are validated every ten (10) minutes on average. 
Ethereum is much faster (i.e., approx. 15 seconds per block are required), yet far 
from providing decent performance for industrial use cases. This slow performance 
of public blockchains is due to their complex mining algorithms that safeguard their 
high security and provide the means for generating new cryptocurrencies. Likewise, 
this slow performance comes with poor energy efficiency, as mining algorithms are 
energy intensive. 

While public blockchains are not suitable for industrial applications, there is 
another class of blockchain networks that provides performance suitable for manu- 
facturing use cases. Specifically, this is the case with private permissioned blockchain 
networks where anyone can participate in the distributed ledger infrastructures as 
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soon as it has proper permission from the governing entity of the blockchain. In this 
blockchain type, the governing entity of the blockchain defines the operations that 
each one of the participants can perform on the blockchain in terms of creation and 
execution of transactions and smart contracts. Permissioned blockchain networks 
consist of a controlled and more limited number of participants. As such they need 
not operate based on complex mining protocols and Proof-of- Work (PoW) but can 
rather dispose with lightweight consensus protocols such as Proof-of-Stake (PoS) 
[Bashir18]. Permissioned blockchains remain slower than ordinary databases and 
do not scale to 100s of thousands of transactions per second, yet they can achieve 
performance of few thousands of transactions per seconds. Considering the require- 
ments of the STAR data provenance system, we opted for a private permissioned 
blockchain. Moreover, we selected the HyperLedger Fabric infrastructure for the 
implementation of the STAR blockchain [37]. 


1.4.3 Architecture of Data Provance and Cybersecurity 
Sub-System 


A part of the STAR architecture for secure and trusted AI systems is depicted in 
Figure 1.1. Specifically, the figure illustrates the sub-system that deals with the reli- 
ability of industrial data and the security of AI algorithms in industrial environ- 
ments. The sub-system sits between: (i) The digital manufacturing platforms and 
the CPPS systems of an Industry 4.0 shopfloor and (ii) The security teams of the 
factory, such as Factory IT and cybersecurity experts, as well as CERTs (Computer 
Emergency Response Teams) and CSIRTs (Computer Security Incident Response 
Teams). The architecture specifies the following modules: 


e Runtime Monitoring System (RMS) — Data sources connectors and 
probes: The RMS provides the means for collecting information from the 
CPPS systems and the digital manufacturing platforms of the shopfloor. It 
comprises several configurable data sources connectors and probes, which 
perform the data acquisition. Connectors and probes are also in charge of 
capturing the metadata associated with each data source and data capture 
(e.g., the identifier, the type, and the data formats of the source). 

e Data provenance and traceability applications: These are decentralized 
applications (i.e., smart contracts) that run over a permissioned blockchain. 
They write metadata (i.e., industrial data and AI algorithms configurations) 
in the various peers of the blockchain. Moreover, they access data configu- 
rations that are written in the blockchain, including information about data 
volumes, statistical data properties, data locations etc. of various data sources. 
Such decentralized applications are used by the cyber-defence strategies of the 
sub-systems to identify possible security risks associated with AI algorithms. 
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Figure 1.1. Part of the architecture of the STAR system for secure and trusted Al in man- 
ufacturing. 


e AI Cyber-defence strategies: These are security mechanisms that detect 
security risk and attacks against Al systems, such as poisoning and evasion 
attacks. They are structured in the form of templates that can be contex- 
tualized to different manufacturing environments. In detecting the various 
attacks, they leverage information about data configurations that are per- 
sisted in the blockchain. To this end, they access smart contract functionalities 
through an appropriate facade and a related API (Application Programming 
Interface). For instance, a cyber-defence strategy may use information about 
the statistical properties of the industrial datasets that are used to train an algo- 
rithm, to identify a potential poisoning attack. In this case, the blockchain 
will provide the actual “sealed” statistical properties, which will differ from 
the poisoned training data. 

e Risk assessment and mitigation engine: This module accesses the impor- 
tance of detected risks and proposes actions for their mitigation. It comprises 
a Security Knowledge Base (SKB) i.e., a repository of known vulnerabilities 
and attacks. The engine consults the SKB towards providing fast revolution of 
known attack patterns, as well fast identification of related mitigation actions. 

e Registry of probes, algorithms, templates, and other assets: To support 
the configurable and dynamic operation of the system, the various compo- 
nents are registered in a proper catalogue (i.e., registry). The registry provides 
real-time information about the probes, algorithms and templates that are 
available, along with information about their status (e.g., active, inactive). 
Part of the registry is used by the RMS, as presented in following paragraphs 
of this section. 

e Security policies manager: This module configures the operation of the sub- 
system through activating and configurating specific data sources, probes, 
cyber-defence strategies and AI models. These configurations are provided 
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in the form of a security policy, which is activated and enforced based on 
interactions with the above listed modules. Security experts and teams are 
responsible for specifying and deploying proper security policies. 


1.4.4 Blockchain Network Implementation and Deployment 


Figure 1.2 illustrates how the Blockchain Data Provenance and Traceability service 
interacts with other non-Blockchain modules of the STAR platform. 

It exhibits a rather complex architecture, the assemblage of which requires the 
use of several interconnected machines each hosting some of its components, thus 
formulating a private permissioned Blockchain network. An Organization partic- 
ipating in the network in this context is a non-Blockchain module of the STAR 
architecture, such as the RMS or the Configuration Manager, that gains benefit 
from recording information on the Blockchain. Everything that interacts with the 
Blockchain network acquires their organizational identity from their digital certifi- 
cate and their Membership Service Provider (MSP) definition. 

Communication of service owners with the Blockchain Network, takes place 
indirectly, but via a multi-level Backend application that exposes several APIs 
to client applications. Another choice would be to transfer those functionali- 
ties directly to the platform's various service components, which conforms more 
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Figure 1.2. STAR data provenance and traceability service. 
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naturally with the Blockchain decentralization paradigm, but this would have 
required their developers to have extensive expertise in decentralized applications 
development. A final proposal would be to make Smart Contracts handle every- 
thing a back-end process does, including the job of the APIs. However, assigning 
only specific tasks to each distinct component has been judged to be way cleaner 
and manageable. 

The various APIs serve different users: one exists for the Authority tasked with 
producing the certificates that will allow participation on the network. Another 
serves administrative and monitoring tasks. The most important API serves the 
parties recording and retrieving data. To conclude, users are authenticated against 
an identity management server (for instance Keycloak’), which entitles them to 
access the permissioned Blockchain network. 

Figure 1.3 provides an anatomy of the Permissioned Blockchain network 
implementation of the project, shedding light into the various technologies used 
to materialize its components. The building blocks and operational processes 
of the Blockchain are directly dictated by the Hyperledger Fabric architectural 
paradigm. Specifically, the architecture exhibits a two-levels structure: (i) A first 
level that comprises different administrative entities (i.e., modules synthesizing the 
STAR platform) and (ii) A second level that comprises various peer nodes (i.e., 
sub-components of said modules) within each service. In-line with the Fabric’s 
architecture, the various peers can interact and exchange data through one or more 
Channels. Only the peers that participate in a Channel can communicate through 
it and share joint ownership of the information stored on the Blockchain. This 
provides flexibility in clustering the peers in different groups that engage in vari- 
ous groups of disjoint transactions. Every node can participate in several Channels 
i.e., it can communicate with different groups of peer nodes. One or more Smart 
Contracts, which in the STAR context are describing traceability information and 
algorithm configurations, are deployed across a Channel. 

Each (peer) node maintains a ledger of the transactions where it is involved. To 
this end, each peer maintains a database such as Apache CouchDB.’ Every time the 
global state commonly maintained via the Blockchain changes (e.g., new metadata 
about a data source become available) a new transaction is initiated, through an 
interaction with a Smart Contract. The latter is responsible for consistently chang- 
ing the status of said state to reflect the inclusion of the new information. In the 
Fabric infrastructure, some nodes can propose and endorse transactions, while oth- 
ers are only able to propose them. Nodes that can endorse transactions ought to 


2. Keycloak Open Source Identity and Access Management System, https://www.keycloak.org/ 
3. Apache CouchDB, https://couchdb.apache.org/ 
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Figure 1.3. STAR permissioned blockchain network example. 


reach an agreement on the current global state of the data by employing a consensus 
algorithm i.e., Raft“ in our case. This increases the flexibility of the permissions and 
functionalities that can be granted to the different nodes, which represent a data 
source within digital manufacturing platforms and CPPS systems. 

As specified in the Fabric paradigm, special nodes (called “Orderers”) validate 
the various requests to update the state against the existing configuration, generate 
new configuration transactions, and package them into blocks that are relayed to 
all peers on the Channel. The peers then process the configuration transactions in 
order to verify that the modifications approved by the Orderers do indeed satisfy 
the policies defined in the Channel. 


1.4.5 Data Modelling and Persistence 


To implement industrial data provenance, there is a need for using a proper data 
model of the industrial metadata that will be stored in the Blockchain. In this 
direction, STAR extends background digital models of the authors [38], which 
comprise the following main metadata: 


e Data Source Definition (DSD): Defines the properties of a data source on 
the shop floor, such as a data stream from a sensor, a CPPS, or an automation 
device. 


4. Raft Consensus Algorithm, https://raft.github.io/ 
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Data Interface Specification (DI): The DI is associated with a data source 
and provides the information need to connect to it and access its data, includ- 
ing details like network protocol, port, the network address and more. 

e Data Kind (DK): Specifies the semantics of the data source. The DK can be 
used to define virtually any type of data in an open and extensible way. 

e Data Source Manifest (DSM): Specifies a specific instance of a data source 
in-line with its DSD, DI and DK specifications. Multiple manifests (i.e. 
DSMs) are therefore used to represent the data sources that are available in 
the factory. 

e Observation: Models the actual dataset that stems from an instance of a data 
source that is represented through a DSM. Hence, it references a DSM, which 
drives the specification of the types of the attributes of the Observation in-line 
with the DK that facilitates the discoverability of the data. An Observation is 
associated with a timestamp and keeps track of the location of the data source 
in case it is associated with mobile device or CPPS (e.g., mobile robot). The 
value type of observation is a complex object which is described with the 
DK entity that an Observation references. Hence, an observation can depict 
multiple raw measurements coming from a machine or a single value (i.e., the 
number of cycles/m of a rotor) or even an Analytics result (i.e., the calculated 
Remaining Useful Life (RUL) of a machine). 

e Edge Gateway: Models an edge gateway of an edge computing deployment 

i.e., a deployment following the edge computing paradigm. In the scope of an 

edge computing deployment, data sources are associated with an edge gate- 

way. This usually implies not only a logical association but a physical associ- 
ation as well, i.e. an edge gateway is deployed at a station and manages data 
sources in close physical proximity to the station. 


The above entities are used to represent the data sources of a digital shopfloor ina 
modular, dynamic, and extensible way. This is based on the registry of data sources 
and their manifests, which keeps track of the various data sources that register to it. 
Furthermore, to facilitate the management and configuration of analytics functions 
and workflows over the various data sources, several analytics-related entities are also 
specified, including: 


e Analytics Processor Definition (APD): Specifies a processing function to 
be applied on one or more data sources. Three processing functions are 
defined, including functions that pre-process the data of a data source (i.e. 
Pre-Processors), functions that store the outcomes of the processing (i.e. Store 
Processors) and functions that analyse the data from the data sources (i.e. 
Analytics Processors). These three types of processors can be combined in 
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various configurations over the data sources in order to define different ana- 
lytics workflows. 

e Analytics Processor Manifest (APM): Represents an instance of a processor 
that is defined through the APD. The instance specifies the type of processors 
and its actual logic through linking to a programming function (e.g., Java). 

e Analytics Orchestrator Manifest (AOM): Represents an entire analytics 
workflow. It defines a combination of analytics processor instances (i.e. of 
APMs) that implements a distributed data analytics task. The latter can span 
multiple edge gateways and operate over their data sources. 


The digital models for industrial data provenance and data analytics follow a 
hierarchical structure, which defines the different relationships between the vari- 
ous entities. For example, an edge gateway comprises multiple data source man- 
ifests. Each one of the latter is associated with a data source definition. Likewise, 
Observations are associated with instances of data sources i.e. data sources manifests 
[Kefalakis19], [Soldatos19]. 

Also, the digital models presented above offer some special characteristics in 
order to be adaptable in various Al-based application specific scenarios such as pre- 
dictive maintenance, quality management and zero defect manufacturing (ZDM). 
The Entities that facilitate such extensions are the Data Kind (DK), Observations 
and Additional Information. 

Data Kind specifies the semantics of the data source data, which provides flex- 
ibility in modelling different types of data. It can be used to define virtually any 
type of data in an open and extensible way. It describes the type, format and data 
kind of the values that are produced by an AI system. Specifically, the “kind” of the 
data is represented with the QuantityKind attribute which is an abstract classifier 
that represents the concept of “kind of quantity”. A QuantityKind represents the 
essence of a quantity without any numerical value or unit. (e.g. A sensor -sensor1- 
measures temperature: sensor! has quantityKind temperature). The Data Kind is 
not only used to describe data sources that are used by an AI system, but also data 
sources that are produced from the system i.e., the outcomes of Al/ML analytics. 

Observation entities model the actual data that stem from an instance of a data 
source (i.e., modelled through a DSM). Hence, it references a DSM, which drives 
the specification of the types of the attributes of the Observation in-line with the 
DK. An Observation is associated with a timestamp and keeps track of the location 
of the data source in case it is associated with a mobile (rather than a stationary) 
data source. Hence, it has a location attribute as well. Observation holds the mea- 
surement or result of a Data Source at the “value” entity which is of type anyType. 
This means that it can support any type of value (even complex structures) that 
are identified from the Data Kind it is referencing. Similar to Data Kind and Data 
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Figure 1.4. Snapshot of the digital models metadata. 


Source the Observation is not only used to describe data that are captured by an AI 
system but also data that are produced from the system (i.e., AI analytics produce 
Observations). 

Finally, AdditionalInformation is a generic entity which allows the extension 
of the existing data model with additional attributed that may be required. For 
instance, Additionallnformation entities are used in the EdgeGateway and Core 
digital model structures to provide optional auxiliary fields that can be used for 
further extensions. 

Figure 1.4 presents a snapshot of the different entities that are used by the differ- 
ent components. These are persisted in the blockchain to support provenance and 
traceability functionalities ate different levels and for different AI applications. For 
example, the Edge Gateway data entities provide discoverability of the Edge systems 
and enable configuration and management of the data provenance system. DSM 
entities provide a global and local visibility of the different industrial data sources 
(i.e. DSDs) used by the AI systems. This enables also discoverability of DSDs and 
of the Observed data that they comprise. Furthermore, the DSD can be used at 
configuration time to associate the data source with some AI algorithm and its 
outcomes. Finally, the persistence of AOM information facilitates the configuration 
of AI analytics functions and enables the dynamic discovery of analytics outcomes 
(i.e., Observations) based on the AOM they are associated with i.e., based on the 
AI data and algorithms that produced these observations. 
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Figure 1.5. Run-time monitoring system in the trusted Al infrastructure. 


1.4.6 Run-Time Monitoring System 


The Runtime Monitoring System (RMS) collects industrial data in real-time and 
stores them for further processing and analysis by AI algorithms. RMS features 
lightweight monitoring probes that are responsible for the data collection and pub- 
lishing to AI platforms. The RMS provides configuration and management mech- 
anisms over the monitoring probes as well as data models and data transformation 
engines that will enable the discoverability and reusability of the collected data. The 
probe management is facilitated by an internal probe registry that maintains infor- 
mation about the probes (including their status), while enabling probe creation, 
reconfiguration, and discovery. Figure 1.5 illustrates a functional diagram of the 
main system components. 
The main functionalities and interactions of these components are as follows: 


e Data bus: This is a communications channel that routes real time data. Plat- 
form components may subscribe to the data bus to receive data of specific 
interest. 

¢ Deployed probes: Probes collect data from the target IoT system or appli- 
cation and stream them to the IoT platform through the data routing 
component. 
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e Probe Management and Configuration: This module is responsible for 
managing and configuring the deployed probes. It can receive automatic 
probe configuration commands and correspondingly configures the managed 
probes. Manual probe configuration commands may also be received through 
the dashboard. The Management and Configuration dashboard provides a 
user interface to the Probe Management and Configuration component. 

¢ Probe Registry: Maintains a record of the deployed probes. Probe 
deployment data, as well as state and configuration data are maintained 
by the registry. The registry provides probe creation, reconfiguration, and 
search capabilities. It facilitates the automatic deployment of probes and their 
dynamic discovery. 

e Automatic Reconfiguration: This sends automatic probe re-configuration 
commands in-line with the implemented security policy. 

e Data Storage: This contains historic security data that have been collected 
by the deployed probes. These data can be used by the Data Analytics to train 
itself and produce a set of security templates that will be used subsequently 
for identifying security issues on the target IoT system. 

e Configuration Management Database (CMDB): This is part of the data 
storage. It contains information about all assets of the RMS, including their 
attributes and configuration parameters. 


RMS is implemented based on technologies of the Elastic Stack, as outlined in 
Figure 1.6. 
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Figure 1.6. Implementation technologies of the run-time monitoring system. 
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1.5 Conclusion 


Blockchain technologies offers advantages for data provenance and traceability in 
industrial environments. These advantages stem from the data security, tampered- 
proof and decentralized nature of distributed ledger technologies. In recent years, 
the research community has developed and demonstrated various blockchain-based 
systems for secure information sharing and data provenance in the scope of manu- 
facturing applications. These systems provide the means for tracking industrial data 
entities (e.g., products, assets), as well as processes performed over them. Neverthe- 
less, they lack support for tracking and tracing AI algorithms, models, and their 
analytics outcomes. This is a set-back for their use in the emerging wave of trusted 
Al applications. As part of this chapter, we have presented a novel blockchain infras- 
tructure that supports data provenance for AI models and algorithms towards boost- 
ing trusted AI in industrial environments. The presented blockchain provides a 
foundation for data reliability. As such it blends nicely with AI systems that are 
presented in later chapters of the book. 

The presented blockchain infrastructure is in its early implementation stages. In 
addition to completing its implementation and validation, we plan to benchmark 
its performance and scalability in real-life manufacturing environments. Moreover, 
we will collect feedback from manufacturing stakeholders, including practitioners 
with field experience. This will lead us to conclusions about the practical applicabil- 
ity of blockchain technology in production lines. Likewise, it will help the research 
community identify the next steps that could move blockchain technology from 
the realm of pilot experiments to practical enterprise scale deployments in indus- 
trial environments. 
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Artificial Intelligence and Secure 
Manufacturing: Filling Gaps in Making 
Industrial Environments Safer 


By Entso Veliou, Dimitrios Papamartzivanos, Sofia Anna Menesidou, 
Panagiotis Gouvas and Thanassis Giannetsos 


This chapter aims to review, from the security standpoint, the artificial intelligence 
solutions used to empower smart manufacturing environments. Our analysis will 
focus on the adversarial models utilized by malevolent entities in order to cause mal- 
functions to Al-powered systems both during the training process, but also during 
the inferencing mode of the leveraged machine learning models. Such attacks can 
have significant impact to the operation of the manufacturing supply chain ecosys- 
tem, as they can affect not only the business continuity, but more importantly, the 
integrity of safety-critical operations of systems. Towards this direction, this chap- 
ter reviews the state-of-the-art in technical approaches to secure machine-learning 
models and pave the way towards the safe adoption of such measures in the manu- 
facturing field. The focus is on new generation of artificial intelligence setups using 
at their core deep neural network structures. In addition, the chapter elaborates 
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on attestation-based provenance mechanisms that guarantee the trustworthiness 
of data streams feeding Al systems. The goal is to highlight the need for robust 
solutions against adversarial machine learning attacks for such environments and 
to provide additional insights on the appropriate mitigation strategies against such 
intelligent aggressors. 


2.1 Introduction 


For many years manufacturing systems lacked information and data security, until 
recently that everything in the manufacturing supply chain ecosystem changed. 
Ethernet and IP protocol layer became the next big thing; of course, some of 
the driving factors for this big change were cost, need for automation and con- 
venience. Networks became a core part of the manufacturing field and currently 
interconnect wider and more complex manufacturing floors. Hence, connectiv- 
ity along with the increased sensing capabilities, and the desire for reduction of 
installation costs gave birth to an increased demand for wireless networks, multi- 
ple IoT devices, and human-robot interaction which is blooming as the new era 
for smart factories. The evolution of human-robot collaboration and Internet of 
Things have major impact on the manufacturing processes, working environment 
and processes, as new services can be developed by the integration of the physical 
and digital worlds. Moreover, this progress has an impact on the physical security 
of the workers and the overall safety in the smart factories, and the reason for this is 
because human-robot collaboration will provide to the workers a more privileged 
job position where the robot will handle most of the dangerous and demanding 
parts of the job. Smart devices and networks with improved capabilities can have 
significant impact on the users’ well-being and on the everyday activities and pro- 
cedures in a manufacturing environment with the emergence of new “systems-of- 
systems” (SoS). 

In addition to the above, the scenery of manufacturing is rapidly changing by 
the penetration of artificial intelligence solutions that primarily aim to boost the 
productivity on the manufacturing operation process. In fact, artificial intelligence 
is revitalizing the smart manufacturing domain with the integration of advanced 
analytic methods capable of processing huge amount of data collected by the mul- 
tiple IloT devices. Based on this, predictive maintenance for minimising operation 
and maintenance costs, improved supply chain management, automated quality 
control, efficient and safe human-robot collaboration and buyer-centric manufac- 
turing are prominent examples of added-value services that have emerged as a result 
of the integration of AI in the manufacturing field. 
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Undoubtedly, the digitisation of the manufacturing field in combination with 
the AI infiltration in the production processes have led to the formation of a rather 
complex cyber-threat landscape on smart industries. More specifically, the threats 
that emerge as a result of the integration of legacy ICT technologies have been 
widely documented in the literature [1], while several reports have documented 
threat taxonomies in this direction [2]. Notably, when it comes to the documenta- 
tion of Al-specific threats, in other words, attacks that target specifically Al empow- 
ered systems and the leveraged AI methods, only recently the community has started 
to document possible attacks that can offend the operation of such systems [1, 3]. 
In this direction, this chapter aims to shed light on the underpinnings of the Al- 
fuelled smart manufacturing and in parallel to put forth adversarial techniques 
that can be used against such AI methods. More specifically, the focal point of 
this work is the in-detailed investigation of the most prominent type of attacks, 
namely poisoning and evasion attacks [3—6]. Poisoning attacks attempt to train 
the deep neural networks in ways that compromise their correct operation with 
the inclusion of intentionally malformed instances in the training set of Al algo- 
rithms. Evasion attacks take place at the inference stage of a deep neural network 
where malicious parties craft data that are incorrectly classified by deep learning 
systems. 

In view of the above, this sets the challenge ahead: “ To which extend AI adversar- 
ial techniques can affect intelligent manufacturing systems, and what are the defensive 
actions that can guarantee the robustness of the AI systems towards achieving increased 
resilience of the production lines and business continuity?” 

Compounding this issue, Section 2.2 offers an analysis of the smart 
manufacturing stack by highlighting the engagement of AI solutions in the man- 
ufacturing processes. Given this analysis, Section 2.3 highlights the cyber security 
posture of Al-fuelled manufacturing systems by documenting impactful vulnera- 
bilities and threats. Section 2.4 documents the importance of solutions, such as 
attestation that can guarantee the integrity of data flows fed into machine learn- 
ing data pipelines. Section 2.5 offers a discussion and critique on the formed field’s 
baseline before Section 2.6 elaborates on the road ahead and discuss novel solutions 
that can increase the residence of AI setups. 

Overall, the motivation of this work is to set the scene on the need for secure 
Al-based systems for manufacturing environments that cannot only enable efficient 
decision making process but can also withstand a prolonged siege from an attacker; 
either targeting the integrity of the input data or the correctness of the classification 
model and process. Having identified the challenges and current hurdles, we also 
put forth a road-map of future research avenues which we need to consider if we 
are to fruitful benefit from the Industry 4.0 revolution. 
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2.2 Hardening the Smart Manufacturing Stack: Towards 
Inter-Trustability of System-of-Systems 


Security intelligence in smart manufacturing is widely used to solve security prob- 
lems, such as incident prevention, detection, and response, by applying machine- 
learning and other data-driven methods. The selection of intelligence sources and 
feeds is vast and growing, so is the choices in methods that can be applied, while 
the problems evolve and new ones appear. To this end, as aforementioned, there is 
a large body of prior work that solves security problems in specific scenarios, using 
specific types of data and specific algorithms [3-6]. Being specific has the draw- 
back that it becomes hard to adjust existing solutions to new scenarios, data, or 
problems. Furthermore, all prior work that strives to be more general is either able 
work with complex relations (graph-based), or to work with time varying intelli- 
gence (time series), but never both. While there exists solutions to spatio-temporal 
problems in graph machine learning, they do not satisfy the conditions: 1. hetero- 
geneity of attributed nodes, 2. time-dependence of the nodes and their attributes, 
3. time-dependence of the relationships, 4. scoring of the nodes, and 5. arbitrary 
interactions that are not necessarily bipartite (i.e., hyperedges). 

In this context, security intelligence data, or simply intelligence, must relate 
to something of relevance to security of interest, i.e., one or more specific 
instance of some entity types, and it must describe the entity (or entities), either 
through attribute(s) or by their relationship. Examples include knowledge that a 
device/sensor exists on the network of concern (identifies an instance, e.g., by a 
securely generated ID), that the device is turned on (an attribute that describes that 
state of the sensor), and that the device has used the Domain Name System (DNS) 
to resolve a domain name (Interaction between the client and domain entities). 

The complete body of all security intelligence is not practically available, but 
parts of it can be observed. The types of intelligence we consider include also 
enriched observations, such as the relation between a device's ID and the hostname 
obtained via reverse lookup in the underlying network (programmable) infrastruc- 
ture. Either way, monitoring of data is one approach to observe intelligence, which 
for instance network owners can use to gain insights to the traffic circulated in a 
smart manufacturing floor, yielding intelligence like the above. Another option is 
to source intelligence from others, via public or private feeds, e.g. for free or under 
some commercial agreement. 

Whether intelligence is sourced from monitoring controlled systems, third par- 
ties, or elsewhere, the arrival of new intelligence is expected to occur at spe- 
cific points in time because monitoring reveals events from observed data, or 
because new data from a feed arrived. To capture this, we define an event to 
be a timestamped observation of intelligence data, where an observation may for 
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example be either a first time observation, interval since last modification or an 
affirmation that the previous intelligence data is still current. For instance, a data 
transmission from a device is an event which provides several pieces of intelligence; 
there is a sensor on the network that has a certain ID, it is active, and it is related 
to the domain name in question. 

In the above, we have explained out how intelligence can be obtained from 
monitoring, external sources, and enrichment, but it may also be obtained from 
machine-learning, heuristics, manual processing and more. Common for all these 
processes is that they take some intelligence as input and produce some new or 
updated intelligence as output. This type of process we refer to as a map process, 
which encapsulates the knowledge of a variety of domain experts into an auto- 
mated. framework that enriches intelligence. In what follows, we dig into more 
detail behind the scenes on the types of information sources that can be considered 
as part of this map process; essentially, the actors that comprise this new paradigm 
of smart manufacturing systems that organize and integrate real-time knowledge 
between physical objects and the virtual computational space [8]. 


2.2.1 Data Source & Security Requirements of Industry 4.0: 
Smart Manufacturing Processes, Actors 
and Safety-Critical use Cases 


Towards this direction, additional Cyber-Physical Systems (CPS) such as reliable 
indoor positioning system and activity recognition systems (e.g., motion capturing 
sensors), together with Al-based software solutions are among the enabling tech- 
nologies that need to be leveraged. The incorporation of robotics into industrial 
systems has accelerated over the last decade, and there are no signals of a slow- 
down on the horizon. Because of regulatory and business measures, such as the 
German-created Industry 4.0 [7], the expanded use of robotic architectures could 
be an unintended result of parallel advances in a few related fields [9]. Plant systems 
(machines, conveyors, and so on), cognitive devices, and the cloud will both con- 
nect and share data in real time using the existing network infrastructures. Every 
one of the machine components, seen as units, collaborates effectively to achieve 
versatility and stability. Operatives must deal with problems including packet losses 
and ineffectiveness that may occur as a result of incompatibilities. To reduce packet 
loss, massive data feedback mechanisms are required [10]. Detectors play a crucial 
role in the application of IoT and CPS in a delivery device. A sensor is described 
as a complex machine that detects light, humidity, reclamation, of some kind and sends 
a signal to a monitoring or controlling endpoint. It is a good resource for convert- 
ing data from the surrounding world into data in a cybernetic environment. It is 
proposed that self-aware and self-monitoring systems be used to capture and relay 
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the information from the production process in actual environments [11]. When 
building a managed work environment with the widespread use of smart appli- 
ances, process management is often encouraged. To ensure effective communica- 
tion across devices for several monitoring processes, oT devices are further split 
into categories, with each class of sensors loosely deployed in a sub-area. a big fac- 
tory or a long product design and development line [10]. These advancements, 
particularly in software engineering and automation, have allowed separate mecha- 
nisms to use smart data analysis to build process information awareness that can be 
used to illuminate the operational behaviour the systems and manufacturing fields. 


2.211 Ideal operational requirements 


The development of manufacturing advances and new processes are expected 
to continue in the future. Modern materials, components and objects will 
emerge [12]. Injection molding is an example of a modern technique that has 
accelerated from the innovation of modern technologies, changed the development 
and manufacturing of products, and unlocked the way to previously untapped 
areas such as biomanufacturing. Manufacturing equipment, for example, devices 
intended for standardized and lateral machining, as well as penetration, have been 
developed to manage different activities. Further type convergence will occur, such 
as the use of advanced products, item schedules, and production procedures, such as 
the identification of a chemical substance that relates to the creation of a new med- 
ication, a delivery mechanism, as well as medication production and the device. 
New-age robots, which are very inexpensive to build and maintain, takes smart 
factory automation to unpredictable levels. IoT devices and application functions 
make new era smart manufacturing systems more intelligent and better suited for 
the plant and beyond communication. 

These ideal manufacturing advances in time increase manufacturing speed and 
productivity. Traditionally, productiveness is described to measure the degree of 
output as compared with a given input. Examples of inputs are individual work- 
ing hours, devices hours, and materials. Productivity may be measured at unique 
tiers of the organizational hierarchy from an individual device to the entire organi- 
zation. Productivity is outstanding from generally used overall performance goals 
including return-on-investment (ROJ), that’s a cost-primarily based frequently used 
at the very highest stages of the organization. A device can adjust its behaviour 
depending by its own knowledge with the aid of artificial intelligence, and whether 
it has sophisticated tracking systems, it can, for example, use cognitive computing 
to automate its processes and be accurate and precise. These activities and applica- 
tions are susceptible to improvements in integration and may benefit from artificial 
neural networks. They should therefore be viewed as part of an intelligent control 
system. Independence exists where a device (a) may respond to feedback and act 
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out its actions to achieve a specified goal, and (b) the unit wishes for the feedback 
loop to function. Advanced control technology is needed. As a result, independence 
must be a component of particular value. A device is said to be fully automated if 
it can automatically execute its own operation, although the level of automation 
varies from device to device [13]. 


2.21.2 Operational and performance assurance 


Manufacturers usually need technological skills to monitor the range and form of 
technology widely available to upgrade their processes, which is posing a significant 
problem for industry 4.0 and smart manufacturing. Provisional application creation 
and evaluation are often carried out in laboratory environments, which may pre- 
clude the software from being publicized and used due to deployment challenges. 
This will go unnoticed by the developer. To establish that smart technological devel- 
opments integrate well with traditional manufacturing processes, it is critical that 
the vendor and product providers collaborate to find problematic areas as well as 
shared solutions and best practices. To guarantee that the current framework ulti- 
mately improves efficiency, performance indicators must be identified. The use of 
performance enhancement standards at all stages and levels of development means 
that supply chains fulfil the anticipated functional criteria while also providing the 
appropriate guidance for quality improvement. The manufacturer’s priorities must 
be supported by performance evidence that cascades from the highest operational 
level to the lowest acceptable level. It is critical that certain small indicators rep- 
resent the duties given at your level while still adding to the organization's total 
operating measure [8]. 


2.2.1.3 Quality assurance 


Analytical tools including simulation and statistical evaluation play a position in 
analysing productiveness through examination in their output reports. Advanced 
knowledge could also analyse comparatively existing system information, recog- 
nize correlations among differentiated system phases and inputs, and refine com- 
ponents that have the greatest effect on yield and productivity [12]. Replacing old 
fashioned manufacturing processes with Machine learning smart manufacturing 
processes can result in huge to slight increase in productivity and profit. Although, 
a reasonable question is how the quality and performance assurance are impacted 
from these radical changes. The quality management roadmap establishes bench- 
marks for enhancing quality for production processes through procurement part- 
nerships with and within individual supplier providers. When a critical occurrence 
happens, it notifies human operators, allowing them to take immediate steps if pos- 
sible. In case of human-robot collaboration time has taught us, that humans may 
be vulnerable to many types of exploits and knowledge base already exists for such 
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type of exploitation. However, the second type of the equation is new to the man- 
ufacturing processes and various ways of exploitations can be found for a malicious 
individual seeking to damage the smart manufacture and attacking the machine 
learning algorithm behind the robot which cooperates with the human. 


2.21.4 Control-safety and secure Al 


Since the human-robot collaboration has been a core part in modern smart man- 
ufactures, as a robot we can categorize multiple IoT devices that can get involved 
in manufacturing processes. In that context heavy parts have to be lifted, various 
metallic and non-metallic components have to be machined and large plates have to 
be connected to one another in frequently performed tasks, big and strong devices, 
such as robotic manipulators, which present a severe safety threat to humans. Mul- 
tiple security procedures, such as locking the machines in physical or simulated 
cages and holding humans at a safe range while the robots are in action, have 
already been introduced. However, in addition to the new conditions for mod- 
ern automotive and manufacturing purposes, a new version of ISO 10218 [14], 
the key specification for safety specifications for robotic systems, has been 
created. 

In the context of incorporating safety standards for autonomous or collaborative 
robots working with humans [12], the proposed rules for operating in a cooperative 
mode also include the following: 


e Stopping functions (10218-1)—requirements are specified for how and when 
the robot should perform protective, or emergency stops when humans are 
in the robot’s workspace [14]. 

e Speed and position control (10218-1)—requirements are specified for the 
maximum allowable speeds of robot arms and end effectors when humans 
are in the robots workspace [14]. 

e Power and force control (10218-1)—requirements are specified for the max- 
imum allowable power and forces applied by robot arms and end effectors 
when humans are in the robots workspace [14]. 

e Design of collaborative operation workspaces (10218-2)—requirements are 
specified for the layout design of workspaces around the robot, including 
safeguarded spaces (where humans are separated from the robot and protected 
by safeguards) and collaborative spaces where humans are not separated from 
the robot and hence the robot shall apply the control limits [14]. 

e Collaborative operation modes (10218-2)—requirements are specified for 
the specific operating modes that must be designed into the robot's control 
function when collaborating with a human in the collaborative workspace, 
including teaching modes and autonomous modes [14]. 
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2.2.2 Human-Robot Collaboration and loT Devices 


While the evolution of smart manufactures is radical and shifts quickly to the new 
era of machine learning and human-robot collaboration, the concern for physi- 
cal security flourishes next to the new era. Robots and IoT devices complexity 
and configurations make extremely dangerous the scalability of the technologies 
that have been evolved within this concept. Given the clear benefits of incorporat- 
ing robotics in smart manufacturing, most areas where they are being completely 
deployed neglect any security defense functionality by nature, making robots unre- 
liable and vulnerable to cyber-attacks. This is one of the factors why human-robots 
are only preferred in testing and have not yet completely proven themselves in the 
market of smart manufacturing. Although it is not an easy job, many guidelines 
are necessary from the start to boost robot and IoT system cybersecurity [15], such 
as: Secure device construction development phases, encrypting robot communi- 
cations, maintaining networks updated, limiting access to authorised customers, 
offering ways to restore a robot to a secure factory default mode, implementing 
cybersecurity guidance, including cybersecurity training for professional machin- 
ists and administrators, allowing consumers to provide input on potential bugs, and 
encouraging security assessments prior to output. 


2.2.2.1 Towards trustworthy smart manufacturing processes 


In smart manufacturing environments, devices can participate in the sensing pro- 
cess and upload their contributions to the backend (or Mobile Edge Computing 
(MEC) layer running) decision-making system, and raw sensor data are collected on 
sensor devices and processed by local analytic algorithms towards producing con- 
sumable data for requesting applications. In this context, for a specific time window 
with n time steps and m sensors, we consider a dataset D containing a sequence (S) 
for each sensor j where Sj = [v1,j, v2,j, <- +5 Vijo «++» Vn,j]- 


Threat Model: The aim of adversarial agents is to mislead the smart manufacturing 
processes towards considering malicious measurement values as legitimate in their 
services. To this end, an adversary may change the input value v;,; in Sj to vi p where 


v, j # vi,j to maximize the distortion: 
max{|vi,j — vijl} (2.1) 


where the distortion should be lower than a maximum allowed considered by the 
adversarial agent. 

There are two primary adversarial attack models [1, 4]: (1) pre-training (poison- 
ing) attacks, and (2) post-training (evasion) attacks. In pre-training attacks, adver- 
saries try to inject malicious data in an attempt to poison the training dataset and, 
thus, decrease the classification accuracy of the classifier. In the post-training attack 


Cybersecurity Posture of Al-Fueled Manufacturing Ecosystem 39 


scenarios, adversaries aim at misleading trained classifiers to mis-classify samples 
towards a malevolent intent. Let us assume f(x) = yi as the mapping function 
to calculate/map x; to yi . For every new sensed values xi, f gives a new output 
f(x) = yi, and we have the following cases: 


e True Positive: if x is positive and f correctly outputs positive, there is no loss 
on the application. 

e False Positive: if x; is negative and f outputs positive, there is a loss on the 
application. 

e False Negative: if x/ is positive and f outputs negative, there is a loss | on the 
application. 

e True Negative: if x; is negative and f correctly outputs negative, there is no 
loss on the application. 


In principle, a machine learning technique tries to minimize |f(x,) — y;| which 
means minimizing | and e. On the contrary, an adversarial attacker attempts to 
maximize the impact of the attack by maximizing |f!) — y‘. 


2.3 Cybersecurity Posture of Al-Fueled 
Manufacturing Ecosystem 


Security in smart manufacturing does not stop in the physical security of the work- 
ers. This radical change might increase safety for the workers thus it will also create 
a lot of information security gaps. Considering the different networking and appli- 
cation layers that are being involved in this big change, a lot of new vulnerabilities, 
attack paths, and information security gaps are being born. Considering the above 
threats, confidentiality and integrity must be ensured in such environments. 

On the way towards such loT-based SoS, this added richness and connectivity 
also poses a significant risk. The new approach of SoS will potentially leave the net- 
work vulnerable providing a huge scale of attack path to malicious users. Further- 
more, in the smart manufacturing environment, this is largely underrated. Between 
April 2012 and January 2014, over 500,000 Computer production devices in sys- 
tem control ecosystems were discovered, as per Project SHINE data [16]. Since 
the installed smart manufacturing systems are far smaller than normal industrial 
equipment, it may not cause warnings to be sent to the owners of such installations 
because there have been relatively few attacks reported on them. However, it is 
worth noting that the presence of recorded attempts on such recently implemented 
programs does not imply a lack of vulnerabilities. It is only a matter of how long 
before the hacker community acquires the basic information needed to initiate suc- 
cessful attacks [17]. The most recent and violent assault on industrial infrastructure 
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was the power grid attack in Ukraine in December 2015 [18]. The attackers used a 
combination of cybersecurity techniques such as malware, denial of service, and 
phishing to take the entire electricity supply infrastructure to a point where it 
became difficult to repair, resulting in power failures across the country. These out- 
ages caused several blackouts, affecting 225,000 clients across Ukraine. Because this 
incident affected the advanced manufacturing ecosystem, it is not shocking that 
there haven't been many accidents involving industry 4.0 systems. However, major 
attacks have been launched against some of the more cutting-edge smart manufac- 
turing systems, most noticeably IoT. Relatively typical IoT nodes combine a consid- 
erably lower CPU with wireless networking network interfaces, encouraging cyber 
hackers to target them explicitly within their radio frequency spectrum. This con- 
tradicts the conventional security paradigm, where there is a well-defined perime- 
ter and sensors (such as firewalls and intrusion prevention systems) are responsible 
for protecting the boundary. Instead, each system would have to be at least par- 
tially responsible for its own protection, a task made more difficult by the restricted 
processing technologies of a standard IoT node. Naturally, this is exacerbated by 
manufacturers failure to recognize the broad implications of inadequately securing 
individual devices, as well as the high-profile IoT botnet Mirai [19], which resulted 
in the biggest denial of service attack seen so far, is a deafening example of this 
disaster. 

Research-wise the most promising and the one that has been given effort and 
developed the last couple of years is Al-based cyber defence mechanisms that are 
decentralized and that can more dynamically classify various attack vectors. Many 
efforts have been made, many algorithms have been developed and the machine 
learning classification models for cyber defence have gotten more sophisticated and 
have improved dramatically the last years. According to Sturm eż al. (2014) [20], a 
void in a 3D printing component would then lead to a reduction in yield, as well as 
other natural physical alterations such as weight, stiffness, and attenuation coeffi- 
cient. Anomaly detection can also detect unusual behaviour on a network or system 
(Kim et al. 2013) [21], as well as image (Chandola et al. 2009) [22], performance 
monitoring, and data acquisition (SCADA) (Garcia et al. 2011) [23], or for preven- 
tive equipment maintenance (Rabatel et al. 2011) [24]. It focuses on the problem 
of calculating the correlation that do not match expected pattern (Chandola et al. 
2009). The concept is to identify patterns of standard practice that the algorithm 
has learned or indicated. Administrators will be notified if an activity deviates from 
the predetermined or accepted model of behaviour. When compared to existing 
methods, anomaly detection has the benefit of being able to detect malicious activ- 
ity. That being said, the adversarial machine learning does not fall in the category 
where the attacker attacks the physical machine or the nodes where the AI agents are 
operating. In this case, the attacker tries to bypass or manipulate the classification 


Cybersecurity Posture of Al-Fueled Manufacturing Ecosystem 41 


model, which has been created, executing his real attack in a stealthy manner with- 
out being detected by the classification model. According to Kumar et al. (2020), 
It is unclear how Machine Learning vulnerabilities can be rated in terms of risk and 
effects. When a security specialist sees headlines of an invasion, the simple truth is 
usually “Is my company impacted by the attack?” and organisations today lack the 
intellect to search an ML area for suspected adversarial ML related vulnerabilities. 
In this recently adopted definition, three kinds of attacks are considered: poison- 
ing, stealing, and evasion. The overarching aim of these models is to minimize 
the classification’s generalization error and potentially deceive the decision-making 
mechanism against desirable harmful calculation metrics stated by Chen Li and 


Jiliang Zhang (2019) [25]. 


2.3.1 Poisoning Attacks 


In the first scenario, the adversary will contaminate the training data. To do this, 
the opponent extracts and infuses an argument that reduces classification preci- 
sion. This attack has the potential to totally alter the classification mechanism dur- 
ing training phase, allowing the attacker to interpret the system’s classification in 
whatever way he sees fit says Vahid Behzadan and Arslan Munir (2017) [26]. The 
extent of the classification error rate is defined by the data used by the perpetrator 
to poison the preparation. The backdoor or Trojan attack, for example, is an espe- 
cially sophisticated attack in this class, in which the attacker deliberately poisons 
the model by adding a backdoor key to ensure it performs well on normal training 
data and testing samples but misbehaves only when a backdoor key is used. When 
we are referring to model stealing, this usually can be met in confidentiality to the 
outer world Machine Learning models which are being implemented with an API 
interface that is open to the public. As an example, consider the ML as a service sys- 
tem: Many encourage individuals to train the models on highly sensitive data and 
charge others on a pay-per-query basis for use. The tension between product confi- 
dentiality and public access motivates the research of model extraction and stealing 
attacks. An intruder with black-box access but no background knowledge of an ML 
model’s characteristics or training set tries to reproduce the model by “stealing it”, 
in these types of attacks. ML-as-a-service services, unlike traditional learning theory 
environments, may accept limited feature vectors as inputs and provide trust values 
with predictions. 


2.3.2 Evasion Attacks 


Moreover, the adversary during the research process, can conduct an evasion attack 
against classification, resulting in an incorrect machine interpretation. In this case, 
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the adversary’s target is to misclassify some data in order to, for example, stay 
stealthy or imitate some favourable behaviour. In terms of network anomaly detec- 
tion, an intrusion detection system (IDS) can be avoided by interpreting the attack 
payload in such a manner that the target of the content can read it, but the IDS 
cannot, amounting to a misclassification. As a result, the perpetrator will dam- 
age the targeted device without being detected by the IDS. Another target of the 
intruder may be to induce concept drift in the system, resulting in persistent system 
re-training and dramatically deteriorating its efficiency. 

The primary aim of this type of adversarial machine learning is to reduce the 
performance of the classification process that is based on machine learning. For clas- 
sification problems, this can be interpreted as increase in false positives, in false neg- 
atives, or in both. For clustering problems, the aim is generally to reduce accuracy. 


e False positives: In classification problems, such as spam detection, where 
there are two states (spam or normal), the aim of an attacker may be to 
make the targeted system falsely label many normal data as falsified data. 
This would lead to the decision-making system miss crucial information. 

e False negatives: Using the same example, if the attacker aims to increase the 
false negatives, then many falsified data will actually be labelled as legitimate. 

¢ Both false positives and false negatives: Here, the attacker aims to reduce 
the overall confidence of the user in the decision-making process by letting 
falsified data go through and by filtering out legitimate data. 

e Clustering accuracy reduction: Compared to classification, the accuracy of 
clustering is less straightforward to evaluate. Here, we include a general reduc- 
tion of accuracy as the overall aim of the attacker of a clustering algorithm. 


2.4 Trustworthiness of Data Input to Machine 
Learning Algorithms 


“AI Is Only as Good as the Data You Feed It” is a well-known phrase in the AI com- 
munity and, indeed, stands true, as it reflects this reality from a technical perspec- 
tive. Al solutions, and especially the latest Deep Neural Network (DNN) setups, 
are very efficient in capturing patterns in data both in supervised and unsupervised 
ways. In this regard, an Al system which is instantiated with a specific training 
set inherits the intrinsic characteristics of the that data. Hence, if a biased training 
set (within a given context) is used, then the trained Al system will gain only a 
partial knowledge of the context for which it was trained for. This may result to 
a poor performance during the actual deployment of the system in practice. This 
is just an indication of the implications that may emerge due to the poor data 


quality. 
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However, apart from the quality of the data, the aim of this section is to high- 
light the importance of the trustworthiness of data which are being fed into the 
AI systems. Following the same mindset, we argue that “AI Is Only as Trustwor- 
thy as the Data You Feed It”. In the context of adversarial machine learning and 
more specifically, in the context of poisoning and evasion attacks, the community 
has witnessed a series of events at stages of the machine learning pipeline (training 
and production) where attackers try to highjack the training process or to evade the 
inference process of Al systems. In both cases, the attackers inject small perturba- 
tions in the data which are just-enough in order to either lead to a faulty trained 
systems or to fool the system at the inference stage. 

It becomes clear, that in order to safeguard Al systems we need, not only to 
enhance the robustness of the AI models per se, but also to deploy additional tech- 
niques that can guarantee the operational assurance of the components taking part 
in the data processing pipelines of Al systems. Thus, we argue that beneficial tech- 
niques, such as Adversarial Training or Defensive distillation [1], can be comple- 
mented event further by solutions that technically can offer verifiable evidence on 
the provenance and integrity of the data, and the legitimate operational state of 
the data generators. Especially, in the case of smart manufacturing, where multiple 
heterogenous devices support different production lines that generate diverse data 
flows, it is crucial to identify these roots of trust. 

In the context of smart manufacturing, attestation can be used as a solution to 
guarantee the operational assurance of systems and to a certain extend to be used 
as the root of trust for the generated data flows. 

Particularly, heterogeneous components must be enabled to make and prove 
statements about the integrity of their produced data so that other components 
can align their actions appropriately and an overall system state can be assessed. 
This goes substantially beyond simple authorization schemes telling who may access 
whom but will require understanding of semantics of requests and chains of effects 
throughout the system and an analysis both statically at design-time and dynami- 
cally during runtime. 


2.41 Attestation for the Trustworthiness of Data Generators 


Remote attestation is an efficient mechanism to provide evidence of the integrity 
status of a remote component. It is typically realized as a challenge-response proto- 
col that allows a trusted party (verifier) to obtain an authentic and timely report 
about the state of an untrusted, and potentially compromised, remote device 
(prover). A prominent root of trust to enable attestation is the Trusted Plat- 
form Module (TPM). The TPM allows to implement remote attestation pro- 
tocols in such a way that the anonymity of the platform is protected. Remote 


44 Artificial Intelligence and Secure Manufacturing 


attestation services are currently used in a variety of privacy-preserving sce- 
narios, ranging from attestation for isolated execution environments based on 
the -now outdated- Intel’s Trusted Execution Technology [27], to more mod- 
ern approaches used in conjunction with Intel’s Software Guard Extensions, 
e.g. (28, 29]. 

From a high-level perspective, a remote attestation protocol requires that the 
prover creates an Attestation Key (AK) via the TPM, which is an asymmetric key 
pair used for signing quotes. A quote is a digitally signed report of the contents 
stored in selected Platform Configuration Registers (PCRs) of the TPM with the 
AK, i.e., a signature of the platform state. In order to preserve the anonymity, the 
prover has the ability to create as many AKs as they wish, but it is required that each 
AK be certified by a trusted third party called the Privacy Certification Authority 
(PCA). A verifier can trust the platform if it successfully verifies that a quote is a 
valid signature over expected PCR values with a certified AK. 

The aforementioned process is the pilar in the trusted computing field in order to 
establish trust among different TPM-enabled entities. The benefits of this solution 
have led to the realisation of numerous attestation approaches, while several imple- 
mentations and research endeavours have emerged with particular focus in IoT 
environments. More specifically, leveraging cryptographic techniques for protect- 
ing and proving the authenticity and integrity of computing platforms, and in turn, 
the data stemming from those platforms, has resulted to a rich scientific field. Both 
integrity and authenticity are two indispensable enablers of trust. Whereas integrity 
provides evidence about correctness, authenticity provides evidence of provenance. 

Typical attestation solutions measure the load-time integrity of user-space appli- 
cations and files read by the root user during runtime. This is the Binary-Based 
Attestation (BBA) scheme proposed by TCG, where measurements and attestation 
consider hashes of binaries. Other solutions, focus on the attestation of only a set 
of critical properties of the attested devices in order to provide more efficient and 
flexible schemes on the basis of Property-based Attestation (PBA) [30]. The afore- 
mentioned schemes offer a rather static assertion on the integrity of a platform and 
its configuration. To tackle this limitation, Control-flow Attestation (CFA) solu- 
tions suggest the acquisition of measurement that reflect the run-time behaviour 
of a processes in order to detect attacks that try to evade the legitimate execution 
behaviour of a system during runtime. 

Considering the above, Al-enabled and loT-based smart manufacturing indus- 
tries can take advantage of remote attestation mechanisms in order to establish 
trust among all the components that operate collaboratively in a manufacturing 
process. By having indisputable evidence on the configuration and/or runtime 
integrity of shop floor devices, the cyber-attack surface is by far minimised leading 
and establishing trust among devices on the shop floor. More specifically, in order 
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to guarantee the integrity and correctness of data, property-based attestation [30] 
seems to be the perfect fit. By identifying these exact properties that need to be 
attested on manufacturing systems, A PBA mechanism can guarantee the opera- 
tional assurance of component which are responsible for data generation which are 
fed into the data pipeline of AI systems. 

Attestation can ensure that the data sent from one device to another device has 
not been tampered, and this could be ensured in all data processing phases, i.e., 
during transport, during generation or processing on the originating device [31]. 
Attestation can be used as a provenance mechanism, as data exchanged between 
devices in a network can be authenticated along with a proof of integrity of all soft- 
ware involved in its generation and processing. The strategy used in [31] to achieve 
this, was to decompose the software of embedded devices into simple interacting 
modules reducing the amount and complexity of software that needs to be attested, 
i.e., only those modules that process the data are relevant. 

In the context of Al-fuelled smart manufacturing, where the trustworthiness 
of data is a crucial requirement that needs to be met, remote attestation seems a 
viable solution to guarantee the integrity of data and minimize the possibility of 
adversarial attacks against AI systems. 


2.5 Discussion and Critique 


Cyber defense in the manufacturing industry is divided into two categories: static 
defense and active defense. Static defense methods are centered on adhering to com- 
mon industrial rules and specifications. Cryptographic corrective actions, intrusion 
detection and prevention systems, human coaching, and incident response man- 
agement are examples of dynamic defense mechanisms. Although static defense is a 
vital step toward improving overall security posture, it is relatively simple, so more 
specifics are overlooked. Manufacturing and smart manufacturing environments 
contain hundreds or even thousands of devices, the majority of which are Inter- 
net of Things (IoT) devices. Cryptographic primitives are well-known and broadly 
used in systems to ensure data confidentiality and integrity. The usage of symmetric 
encryption algorithms, public key infrastructure (PKI), hybrid encryption schemes, 
cryptographic hash functions, and digital signatures can secure the integrity of the 
data, can be used for authentication, ensures that a sender when sending a mes- 
sage, cannot deny the authenticity of a message that he sent to the recipient, non- 
repudiation and many other aspects of security. Another cyber defense mechanism 
is intrusion detection systems in smart manufacturing network-based environments 
which are categorized in Host-based IDS and Knowledge-based IDS (34, 35]. Host- 
based IDS gather data on single hosts compared to Knowledge-based IDS which 
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are accumulating information about previous security flaws and find patterns to 
detect intrusions. Both of these security mechanisms work with signature-based 
security and basically, the limitation of signature-based security is that they can- 
not capture so easily zero-day exploits and newly introduced attacks. Due to the 
complexity of modern systems and smart IoT devices used in smart manufacturing 
environments traditional machine learning (tree based, Bayesian based, SV Ms, etc,) 
systems and models are operating based on input data (e.g. Network data, images 
from robots, sound data, coordinates etc.) that is collected mainly on network end- 
points which are monitored by our system. Based on this input they can perform 
a number of decisions (e.g. Alert the system administrator, raise an incident etc.) 
based on classification models which, however, can be considered as limited (Zhang 
et al. (2019) [25], Banerjee et al. (2018) [4] and Meng Qu et al. (2018) [32]), 
because they do not take advantage of enhanced understanding of events that may 
happen in other parts of the network, as well as the luck of appropriateness for 
aggregating heterogeneous neighbours with different content features. By features 
we refer to the features extracted from monitoring and processing collected network 
and host-based data that can be used in the classification of specific attack vectors. 
Mote specifically, there is no correlation of data acquired by different individual 
sources. In the industrial sector, and even in the scientific literature, for example, 
deep learning has been largely applied to datasets in which the training data are: 
(i) independent of each other, and (ii) homogeneous, i.e., the subjects of the classi- 
fication or regression are instances with same entity type, whereby each section in 
the schematic diagram has a consistent interpretation and format. Thus, there is a 
need to develop more accurate classification models when it comes to detecting a 
wider range of attacks, based on the classification of malicious and benign network 
traffic, in collaboration with advanced Al. 


2.6 Outlook - Road Ahead 


Entities in smart manufacturing infrastructure are most probably heterogeneous 
and endowed with characteristics that change dynamically over time compared to 
their subsequent interactions. To apply deep learning to such entities, for example, 
for classification, one must first assimilate the encounters into the feature engi- 
neering process in a structured manner. The reason for these research questions 
is to demonstrate how, in this regard, the present state of graph machine learn- 
ing is insufficient and needs supplementation with a rigorous function engineering 
framework in space and time. Zhang et al. (2019) [25] provides enough proof to 
challenge the concept that traditional machine learning methods are not suitable 
to create the most complete and concrete classification model. Also, in the H2020 
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STAR project the approach that is investigated to overcome such challenges and 
limitations is using Graph machine learning and LSTM which shows promising 
results nonetheless there are still a number of open challenges to consider especially 
related to the order of monitored events and the time they are present in the system. 
We use this base and state of the art machine learning methods to challenge the most 
dangerous threat that is present to traditional machine learning classification mod- 
els, the “Concept Drift” attack. In smart manufacturing “concept drift” attacks can 
apply in multiple examples, one of the examples is the temperature of a very critical 
room where IoT sensors are present. The attacker can manipulate the classification 
model changing its perspective by increasing very slowly the temperature of the 
room thus, impacting the manufacturing environment and causing huge damage 
to the machines. Graph machine learning is enhancing the knowledge, given to the 
classifier, by using different types of data produced by neighbouring endpoints, as 
well as the interaction of the neighbours with other devices and endpoints (entities). 
This is the difference between the intrinsic and extrinsic features based on which 
the classification takes place. Each of these objects has properties, i.e., characteris- 
tics, that are inherent to them. It should be noted that these intrinsic properties are 
often transient and therefore necessitate a sequential treatment. Extrinsic character- 
istics, on the other hand, emerge from the entities’ relations with one another, and 
are influenced by different environmental parameters. When entities communicate, 
their extrinsic properties, both of which are dynamic, must be modified to accom- 
modate the changing probability that any particular entity bears. The combination 
of both intrinsic and extrinsic features enhances the knowledge of the classifier and 
this is the benefit that Graph machine learning offers to other traditional machine 
learning methods. A more specific and novel solution to the above procedure is the 
usage of Bipartite Graphs for hypergraph machine learning. The solution requires 
a combination of Bipartite graph models with advanced AI LSTM (Long Short- 
Term Memory) agents. LSTMs, introduced by Hochreiter et al. [33], and their 
ability to learn on data with relationships and with long-range temporal dependen- 
cies, makes them a well-suited technology for phenomena with spatial and time 
characteristics such as time series prediction, machine translation, speech recogni- 
tion, language processing. Since there can be unexplained lags between significant 
events, LSTM is useful for sorting, classifying, and drawing conclusions based on 
time series data. The reason for using this specific type of ML-based agents is the 
fact that they take into account the time dependency which is crucial in cyber- 
security attacks. Based on the use of LSTMs a new classification framework can 
be designed to prove the efficiency and effectiveness in accuracy compared to tradi- 
tional classifiers. The most usual problem which has been indicated from traditional 
AI methods is the inability of the methods to successfully flag “Concept Drift” type 
of attacks. In these types of attacks, the attackers manipulate the data slightly as the 
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time goes which can disarms the ability of the traditional AI methods to success- 
fully classify an attack. Thus, the use of LSTM is imperative for creating the right 
framework. 


2.7 Conclusions 


This chapter focused on the AI adversarial tactics against smart manufacturing 
in order to identify the gaps that enable cyber attackers to manipulate AI sys- 
tems. As such systems have become an integral part of the modern production 
lines for supporting a wide range of operations, from predictive maintenance to 
safe human-robot collaboration, among others, such systems have attracted the 
interest of attackers. In this direction, this chapter offered a review on the cur- 
rent status of smart manufacturing domain by highlighting the emerging threats 
and it overall security posture. In this context, we elaborated on the emerg- 
ing threats of poisoning and evasion attacks against Al manufacturing systems 
and how attestation mechanism can be used to guarantee the trustworthiness 
of generated data in the manufacturing domain. This analysis led to a discus- 
sion on the road ahead that gave the chance to document the benefits of includ- 
ing Graph machine learning and LSTM for building robust AI setups for smart 


manufacturing. 
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Chapter 3 


Knowledge Modelling and Active 
Learning in Manufacturing 


By Jože M. Roganec, Inna Novalija, Patrik Zajec, 
Klemen Kenda and Dunja Mladenié 


The increasing digitalization of the manufacturing domain requires adequate 
knowledge modeling to capture relevant information. Ontologies and Knowledge 
Graphs provide means to model and relate a wide range of concepts, problems, 
and configurations. Both can be used to generate new knowledge through deduc- 
tive inference and identify missing knowledge. While digitalization increases the 
amount of data available, much data is not labeled and cannot be directly used to 
train supervised machine learning models. Active learning can be used to identify 
the most informative data instances for which to obtain users’ feedback, reduce fric- 
tion, and maximize knowledge acquisition. By combining semantic technologies 
and active learning, multiple use cases in manufacturing domain can be addressed 
taking advantage of the available knowledge and data. 
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3.1 Introduction 


Digitalization enables collecting and storing data in a digital format. Digital data 
enables changes at the process, organization, and business domain levels [1]. 
In manufacturing, it allows to achieve increased process efficiency (with lower 
performance variability and less unplanned downtime), a more efficient use of 
resources (e.g., lower energy consumption), increased safety and sustainability, 
product quality, and reduced product launch time [2—4]. Smart factories are built 
based on three principles [2]: cultivate digital people, introduce agile processes, and 
configure modular technologies. [5] identifies four digitalization challenges in man- 
ufacturing: how to digitally augment human work, enable worker-centric knowl- 
edge sharing, create self-learning manufacturing workplaces, and enable mobile 
learning. These challenges and benefits were recognized by several national and 
international initiatives (Advanced Manufacturing (USA), Industry 4.0 (Germany 
and the European Union) [6], Made in China 2025, New Robot Strategy (Japan), 
New Industrial France, High-Value Manufacturing (UK), Make it Happen (Aus- 
tralia)), and new paradigms created to realize them. Among such paradigms, we 
find Cyber-Physical Systems [7], Digital Shadows, and Digital Twins [8]. Cyber- 
Physical Systems were conceived as smart and embedded systems that result from 
the integration of physical and computational processes [9, 10]. In Digital Shad- 
ows, the data flow is unidirectional (from the physical counterpart to the digital 
replica), while in Digital Twins, this flow is bidirectional (changes in the digital 
object can lead to changes in the physical object) [11]. Multiple authors proposed 
enhancing the Digital Twins providing cognitive capabilities using a knowledge 
graph [12, 13]. Such technologies, along with the Internet of Things and Artificial 
Intelligence, bring added value into industrial value chains [14]. 

To capture data in a digital form, sensors and software, such as as Enter- 
prise Resource Planning (ERP) or Manufacturing Execution Systems (MES), are 
used. There are, however, many operational aspects and contextual information 
the employees are aware of that the sensors and the software systems mentioned 
above do not capture. Thus, it is essential to develop interfaces and mechanisms 
to gather such information while minimizing interaction friction with end-users. 
Some examples can be found in other domains, where to mitigate the knowledge 
gap, researchers developed conversational interfaces that identify missing knowl- 
edge and ask the users to provide it [15, 16]. In such a context, semantic tech- 
nologies and active learning can play a crucial role. Semantic technologies enable 
encoding domain knowledge (in ontologies and knowledge graphs) and provide 
means to perform inference (considering rules and logics) [17]. On the other side, 
active learning allows one to choose the most informative pieces of data to gather 
additional insights from experts. 
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This chapter discusses semantic knowledge representations (ontologies and 
knowledge graphs), the usage of active learning, and use cases in the industrial 
domain that benefit from both. 


3.2 Semantic Knowledge Representations 


3.2.1 Ontologies in Manufacturing 


Ontologies are explicit specifications of a conceptualization (an abstract, simplified 
view of the world) regarding objects, concepts, and entities, and the relation- 
ships between them [18]. One of the main issues regarding knowledge manage- 
ment in the manufacturing domain is the wide range of concepts, problems, and 
configurations present [19]. A possible solution to this is the usage of seman- 
tic technologies. Ontologies provide a formal specification of a shared conceptu- 
alization in the domain of interest by defining concept hierarchies, taxonomies, 
and topologies [20, 21]. They provide information interoperability between dif- 
ferent domains and enable reasoning. Among the ontology use cases in manu- 
facturing mentioned in the literature, we find knowledge sharing and reuse in 
distributed manufacturing settings [22], linking between product assemblies and 
manufacturing resources using manufacturing operations [23], and production line 
processes [24]. Several ontologies were considered and developed in the manufac- 
turing domain. Upper ontologies provide high-level concepts that can be extended 
to create domain-specific ontologies. Among the upper ontologies we find the Basic 
Formal Ontology (BFO) [25], Suggested Upper Merged Ontology (SUMO) [26], 
Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) [27], 
General Formal Ontology (GFO) [28], Object-Centered High-level Reference 
Ontology (OCHRE) [29], Politecnico di Milano—Production Systems Ontology 
(P-PSO) [30], Manufacturing Reference Ontology (MRO) [31], Manufacturing 
Systems Ontology (MSO) [32], and the Manufacturing System Engineering (MSE) 
ontology model [33]. 

The BFO ontology attempts to model time and space. To that end, it divides 
entities into two disjoint categories: continuants (something that exists at a point in 
time) and occurrents (something that is realized in time, e.g., processes and events). 
SUMO is considered the largest formal public ontology, mapping the whole Word- 
Net lexicon. It divides entities into two disjoint categories: physical (represents 
objects and processes) and abstract (represents sets, propositions, quantities, and 
attributes). The upper ontology is complemented with the MId-Level Ontology 
(MILO), and domain-specific ontologies are developed on top of them. DOLCE 
was developed to capture ontological categories underlying natural language and 
human commonsense. Their focus is to describe categories as cognitive artifacts as 
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represented in human perception, rather than in the intrinsic nature of the world. 
The ontology divides entities into two categories: endurants (continuants) and per- 
durants (occurrents). A mapping between BFO and DOLCE was proposed in [34]. 
GFO provides a different worldview, dividing entities into two categories: presen- 
tial and process. Presentials refer to entities that are entirely present at a given point 
in time. To model how presentials acquire different values in time but remain the 
same entity, they refer to persistents (a specific universal representing the presen- 
tials). The processes represent functions that have a temporal extension and cannot 
be wholly present only at a given point in time. The persistent-presential aspects 
are discussed in OCHRE under the terms of thick and thin objects, where the 
thick objects refer to aspects that change over time, while the thin object refers to 
core aspects that remain the same through time. While the ontology distinguishes 
between endurants and perdurants, it does so by modeling participation as a special 
case of parthood and avoids assuming two separate domains. P-PSO was designed 
as a meta-model to describe the manufacturing domain from an object-oriented 
perspective. When doing so, it considers three aspects of the manufacturing setting: 
physical (entities material definition), technological (system functional view), and 
control (production operation procedures) aspect. The MSO evolves the P-PSO, 
addressing a wider domain, built with a different purpose, and providing a dif- 
ferent approach to the control and visualization aspects. Regarding the domain, 
high-level classes are defined to address all types of industry, and specific classes 
are defined as specializations of such high-level objects. In particular, the ontology 
extends the scope to logistics and the process industry. In contrast to the P-PSO, 
which provides a general taxonomy but does not define a specific usage, MSO was 
designed for production system control. Finally, regarding control and visualiza- 
tion aspects, P-PSO defines entities and relationships to be considered for man- 
ufacturing control. In contrast, MSO provides definitions at a conceptual level, 
assuming the ontology only interacts with different software placing its interest 
on the outcomes, without the need to represent the inner design and working of 
the software service. The MRO was designed as an upper manufacturing ontology, 
defining the terminology based on existing standards. A different approach was 
adopted by the MSE ontology model, which provides a model to support informa- 
tion autonomy and facilitate information exchange between inter-disciplinary engi- 
neering design teams while leaving to each team freedom to adopt their terminol- 
ogy. The aforementioned upper ontologies are considered when creating domain- 
specific ontologies. 

Manufacturing always relates to a specific product, and authors developed spe- 
cific ontologies to describe them. [35] developed PRONTO (PRoduct ONTOI- 
ogy), which defines concepts, relationships, and axioms mainly related to the 
manufactured products’ structure. The ontology considers raw materials, how 
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those are assembled into a product, and derivative products. [36] noticed that 
ontologies and standards aim to facilitate a common grounding by sharing 
expert knowledge and finding agreement on a particular domain. They devel- 
oped the ONTO-PDM ontology based on existing models! from the IEC 62264 
standard. 

Products cannot be developed without a specific manufacturing process. [37] 
developed the Process Specification Language (PSL) ontology to describe manu- 
facturing processes throughout the manufacturing life cycle. [24] developed and 
applied a meta-model to describe a material-processing production line, which 
supports defining the behavior of an entity over time through state transitions. 
[38] developed the Manufacturing Resource Capability Ontology (MaRCO), 
to describe the capabilities of manufacturing resources, concentrating solely on 
machines and tools, so that can be used to support semi-automatic system design 
and auto-configuration of production systems. Another effort to describe prod- 
ucts, production processes and resources is the P2 ontology, developed by [39]. 
[40] describe a manufacturing ontology-based on the DOLCE ontology and the 
Adaptive Holonic Control Architecture for distributed manufacturing systems 
(ADACOR) [41], that describes manufacturing scheduling and control opera- 
tions. Another view on scheduling was developed by [42], who introduced the 
SIMPM (Semantically Integrated Manufacturing Planning Model) ontology, mod- 
eling manufacturing planning task according to time, variety, and aggregation. 
[43] presents the Supply Chain Operations (SCOR) ontology to facilitate the inter- 
operation between applications involved in the supply chain. A product data model 
in a cloud manufacturing context was developed by [44]. Finally, additive manu- 
facturing was subject of several ontologies [45, 46]. 

Another relevant aspect to the manufacturing domain is the sensors, which 
enable data gathering. OntoSensor [47] aims to provide a broad knowledge base 
of sensors for query and inference, based on the SensorML standard.* In the same 
line, [48] proposed an ontology to describe sensors’ capabilities and operations. 
[49] developed WISNO (Wireless Sensor Networks Ontology) to deduce high-level 
information from low-level, implicit context and checking ontologies’ consistency. 
[50] describes an ontology to characterize sensor capabilities and properties as the 
composition of their building blocks through three description levels: domain con- 
cepts, abstract sensor properties, and concrete properties. 


1. TEC 62264 models are: Product Definition, Material, Equipment, Personnel, Process Segment, Production 
Schedule, Production Capability, and Production Performance. 


2.  SensorML is an approved Open Geospatial Consortium standard. More details are available at https://ww 
w.ogc.org/standards/sensorml 
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3.2.2 Methodologies for Ontology Design in Manufacturing 


Rarely an ontology satisfies all the requirements and frequently needs to be 
extended, or new ontologies need to be developed to cover a new domain. Mul- 
tiple methodologies were described in the literature to build an ontology. While 
each methodology has a different emphasis, there is consensus on most steps to be 


followed: 


— identify the problem to be solved, opportunity areas, and a potential solu- 
tion [51-53] 

— decide on the formality level required [52] 

— define the problem, scope and competency questions [54] 

— elicit required knowledge from multiple sources. Identify key concepts and 
relationships. Identify terms that refer to the concepts and relationships. 
[51-53]. When doing so, consider the MIREOT guidelines [55]. 


— evaluate against a frame of reference [51, 53] 


Specific methodologies were developed to guide the ontologies’ construction in 
the manufacturing domain. [56] proposed a six-stage methodology: identify root 
concepts of taxonomies, identify existing taxonomies, create taxonomies, appli- 
cation test, build terms thesaurus, and refine the integrated taxonomy. [57] also 
defined a six-step methodology but provided a different procedure: specification 
(determine scope and granularity), conceptualization (acquire knowledge), formal- 
ization (structure acquired knowledge), population (convert acquired knowledge 
into frame-based representation), evaluation (validate accuracy and completeness), 
and maintenance (update the ontology once established). A different approach was 
developed by [22], who combined the Unified Modelling Language and the Object 
Constrained Language to translate entities from a software object model to ontol- 
ogy entities. [58] developed a four-step methodology using a Simple Knowledge 
Organization System (SKOS) framework to develop a thesaurus of concepts, to 
then identify relevant classes and provide logical constraints and rules. [59] suggests 
a three-step methodology, inspired in [60], that requires an ontology requirements 
specification (purpose, scope, and ontology requirements analysis), an analysis of 
existing resources (reuse ontological and non-ontological resources), and a con- 
ceptualization and formalization. A similar approach was developed by [61], with 
an emphasis on manufacturing design. [62] defined a nine-step methodology for 
developing process ontologies. First, it requires defining the project’s purpose and 
scope, identifying potential classes and formal attributes, and writing them down 
to a context table. Concepts and subconcepts should be drafted to a lattice, which 
is used to resolve inconsistencies, and then converted to a class hierarchy. The 
last steps correspond to integrate the hierarchy with some upper ontology and the 
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classes formally defined through axioms and relationships. Finally, [63] envisions a 
different scenario, creating an ontology building methodology for Cyber-Physical 
Systems in the manufacturing domain. The methodology consists of three steps: 
ontology requirements specification (based on project requirements), lightweight 
ontology building (considering requirements, information resources, and other 
lightweight ontologies), and heavy-weight ontology building (taking into account 
the lightweight ontology and ontology design patterns). 


3.2.3 Knowledge Graphs in Manufacturing 


Among the rich literature describing knowledge graphs, there is no agreed unique 
definition for them. The knowledge graphs are built upon the idea that graphs can 
be used to capture knowledge. Nodes are used to define abstractions and instanti- 
ate entities, which can be linked with edges, representing relationships [64]. They 
can be either domain-specific or domain independent [65]. Many implementa- 
tions constrain the edges in knowledge graphs according to some schema or ontol- 
ogy [66], providing a formal concepts’ definition. [67] provides a comprehensive 
introduction to knowledge graphs, discussing data models, schemas, deductive 
and inductive techniques, quality dimensions, refinement methods, and prominent 
open and enterprise knowledge graphs. Deductive inference can be used to derive 
new knowledge from existing data and rules known a priori. Inductive knowledge, 
on the other side, is acquired by generalizing patterns from input observations, 
either using supervised or unsupervised methods. Knowledge graph refinement 
attempts to identify wrong information in the graph (ensure that it is free of error) 
and complete missing information (satisfy completeness) [65]. Such tasks can ben- 
efit from knowledge graph embeddings, which reduce nodes and edges to con- 
tinuous vector spaces while preserving the inherent graph structure [68]. When 
assessing the quality of a knowledge graph, [67] highlights four quality dimen- 
sions: accuracy (the extent to which the knowledge graph represents the real-world 
domain), coverage (avoid the omission of elements that are relevant to the specific 
domain), coherency (conformity to formal schema or ontology), and succinctness 
(avoid irrelevant data). [69] describes a wide range of quality metrics, classifying 
them in four quality categories described by [70]: intrinsic data quality (quality 
of data on its right, regardless of the use case), contextual data quality (assessed 
concerning the task at hand), representational data quality (relates to the format 
and meaning of the data), and accessibility data quality (relates to how data can be 
accessed, considering accessibility, licenses, and interlinking). 

Knowledge graph implementations can adopt one of three assumptions: open 
world, locally closed world [71], or closed world assumption. Open world 
assumption considers that a statement can be true irrespectively of whether it is 
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known to be true since there is much unknown information compared to the 
encoded knowledge. The local-closed world assumption considers the knowledge 
representation is locally complete. The truth regarding a statement can be deter- 
mined as long as the set of existing object values for a subject and predicate are not 
empty. Finally, the closed-world assumption assumes that only statements known 
to be true can be true. 

Manufacturing knowledge is gaining an increasing amount of attention [72]. 
The use of knowledge graphs to model it was reported in multiple scenarios. [73] 
describe building and using a knowledge graph to integrate information of products 
and equipment obtained from heterogeneous data sources. The knowledge graph 
is a cornerstone to an intelligent manufacturing equipment information system. 
[74] report encoding purchase records data in a supply chain knowledge graph 
and use embeddings to recommend the best suppliers for the purchase demand. 
[75] use natural language processing to extract disassembly data (entities and the 
nature of components) and then encode it in a knowledge graph, which helps to 
acquire, analyze and manage disassembly knowledge. A different purpose is envi- 
sioned by [76], who integrate semantic information of the workers with temporal 
profiling information, and facial recognition. Finally, [77, 78] describe a knowledge 
graph to integrate information regarding Industry 4.0 standards and standardiza- 
tion frameworks. Using graph embeddings, they can detect standards relatedness, 
identify similar standards and unknown relations. 


3.3 Active Learning 


Active learning is a field of machine learning that studies how to select unla- 
beled data samples and query an information source to label the selected sam- 
ples [79-81]. The underlying assumption is that unlabeled data is abundant, and 
labeling resources are scarce. Therefore, it is necessary to devise mechanisms that 
enable the identification and selection of samples with a higher information poten- 
tial. The promise to reduce the amount of data required to train new models has 
driven increased interest in active learning in the academic community. At the same 
time, the adoption remains low in industry [82]. 

Three different active learning scenarios are described in the literature [83]: 
membership query synthesis, stream-based selective sampling, and pool-based 
active learning. In membership query synthesis [84, 85] an algorithm creates 
its instances (queries) from an underlying distribution to ask the expert if the 
instance corresponds to a particular label. Stream-based selective sampling con- 
siders one unlabeled instance at a time, evaluating its informativeness against the 
query parameters. The learner decides whether to query the teacher or assign the 
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label by itself. Finally, in pool-based sampling, unlabeled instances are drawn from 
the entire data pool and assigned an informative score. Most informative instances 
are selected, and their labels requested. While most methods rely on model uncer- 
tainty and clustering to choose the unlabeled examples [86, 87], new approaches 
were developed based on adversarial sampling, Bayesian methods, and weak super- 
vision. [88] introduced a new approach to active learning (Generative Adversarial 
Active Learning (GAAL)) by leveraging Generative Adversarial Networks (GAN). 
The purpose of the GANs is to generate informative instances based on a ran- 
dom sample of unlabeled instances close to the decision boundary. [89] evolved 
this concept generating synthetic data with a conditional GAN, which learns to 
create a specific instance leveraging additional data regarding the desired target 
label. [90] introduced the variational adversarial active learning, sampling instances 
using an adversarially trained discriminator to predict whether the instance is 
labeled or not based on the latent space of the variational auto-encoder. Since 
the sampling ignores the instance labels, the discriminator can end up selecting 
instances that correspond to the same class, regardless of the proportion of labeled 
samples of such class. To solve such an issue, [91] developed a semi-supervised 
minimax entropy-based active learning algorithm that leverages uncertainty and 
diversity in an adversarial manner. Another approach was developed by [92], who, 
instead of uncertainty sampling, used a GAN to generate high entropy samples 
and retrieve similar unlabeled samples from available data to acquire the cor- 
responding labels. A variation to GAAL, and based on previous work by [93], 
[94] developed a Bayesian generative active deep learning approach, performing 
a joint training of the generator (a variational autoencoder) and the learner, which 
requires smaller sample sizes and a single training stage. Different approaches were 
developed by [95-97], who explored using active learning in a weak supervision 
setting. 

Despite the wide range of active learning approaches, there is currently a research 
void regarding the use of active learning in the manufacturing domain [98]. It was 
successfully applied to predict the local displacement between two layers on a chip 
in the semi-conductor industry [99], for automatic optical inspection of printed 
circuit boards [100], to improve the predictive modeling for shape control of com- 
posite fuselage [101], and in multi-objective optimization [102]. 


3.4 Use Cases and Open Challenges 


Semantic technologies and active learning can be used to identify missing knowl- 
edge. Within the semantic domain, [103] proposed a typology of missing knowl- 
edge, identifying three types of missing knowledge: abstraction dimension (how the 
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knowledge is contained inside the KG structure), terminological knowledge (how to 
map terms to concepts), and question-answering dimension (how the lack of knowl- 
edge affects the answering process). [104] proposed to frame the missing knowledge 
problem as an anomaly detection problem, where they use a heuristic to identify 
missing knowledge in system rule bases. They take into account user input dur- 
ing the inference process for items that are considered askable. [105] suggested an 
approach based on first-order logic and dual polynomials. They use triples consist- 
ing of a question, answer, and a label that can indicate if the answer is missing or 
wrong. For missing answers, they developed heuristics to create potential answers 
that comply with a closed world setting. [106] proposed developing an interface to 
issue SQL queries that can target either a relational database or crowdsource certain 
operations, such as find new data or perform non-trivial comparisons. The authors 
consider that while many operations can be successfully completed with data within 
a database, humans can assist with operations such as gathering missing data from 
external sources, moving towards an open-world assumption. 

[107] combined ontologies with natural language processing to develop a ques- 
tion answering interface that enabled users to access available underlying data 
sources. Among other results, the authors highlighted how such a system pro- 
vided a positive experience to the users, doubling user retention. [68] describes the 
use case, using a knowledge graph to simplify question answering by organizing 
them in a structured format. [108] tackles question answering by creating vector 
embeddings of questions and knowledge graph triples so that the question vectors 
end up close to the answer vectors. A different approach was considered by [16], 
who developed Curious cat. This application leverages a semantic knowledge base 
and user’s contextual data for knowledge acquisition through question-answering. 
A similar knowledge acquisition approach for the manufacturing domain was envi- 
sioned by [109], who developed an ontology to model user feedback based on a 
given forecast and provided explanations. Following the need to augment human 
work with digital technologies and provide personalized information at the shop- 
floor level [110, 111] developed a smart assistant for manufacturing. The smart 
assistant creates directive explanations for the users by using heuristics and domain 
knowledge. The application tracks user’s implicit and explicit feedback regarding 
local forecast explanations, enabling application-grounded evaluations. Though the 
authors tested their approach on the demand forecasting use case, the application 
can be extended to other use cases. Other relevant use cases are the usage of seman- 
tic technologies to build a decision support system [112], automatically identify 
opportunities to enhance production scenarios [113], and intelligent condition 
monitoring of manufacturing tasks [114]. 
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Though little research reports on the usage of active learning in manufactur- 
ing [98], we consider it can be widely applied in this domain. By selecting the most 
valuable instances to the system, it helps to minimize friction towards the end- 
user and collect valuable data [115]. Active learning can also increase the diversity 
of recommendations [116]. This approach can be used in applications recom- 
mending decision-making options to balance usual recommendations and decision- 
making options requiring more user feedback (more labeled instances) to enhance 
the underlying recommender system. Other relevant active learning use cases to 
manufacturing can be anomaly and outlier detection [117—119]. 

In the European Horizon 2020 project STAR (Safe and Trusted Human Centric 
Artificial Intelligence in Future Manufacturing Lines), knowledge modeling and 
active learning are used to gather locally observed collective knowledge regarding 
operations in the manufacturing lines and provide accurate context, relevant data, 
and decision-making options to the users. Among relevant use cases to the project 
are production planning (to gather additional context regarding downtimes and 
anomalies in production), optical quality inspection (to learn from images of defec- 
tive parts), and logistics (to learn logisticians’ decision-making based on available 
options). 


3.5 Conclusion 


Semantic technologies provide means to encode domain knowledge and enable 
deductive inference through reasoning engines. In this work we presented many 
upper-level and domain-specific ontologies from the manufacturing domain, and 
upon which new ontologies can be built. We also described multiple methodologies 
used to guide the ontology creation process, some of them specific to the manufac- 
turing domain. 

Semantic technologies can be leveraged for knowledge acquisition. Missing 
knowledge detection can be linked to a question-answering interface to gather 
required knowledge from the users. Similarly, active learning can be used can iden- 
tify the most informative data instances and ask the users for feedback. This enables 
to gradually increase the dataset and its information density, which can be lever- 
aged to train machine learning models, and enhance their performance. While lit- 
tle scientific literature reports on the usage of active learning in the manufacturing 
domain, multiple use cases can benefit from it, such as anomaly detection in pro- 
duction planning, optical quality inspection, and the recommendation of decision- 
making options. 
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Chapter 4 


Multimodal Human Machine Interactions 
in Industrial Environments 


By Rubén Alonso, Nino Cauli and Diego Reforgiato Recupero 


This chapter will present a review of Human Machine Interaction techniques for 
industrial applications. A set of recent HMI techniques will be provided with 
emphasis on multimodal interaction with industrial machines and robots. This list 
will include Natural Language Processing techniques and others that make use of 
various complementary interfaces: audio, visual, haptic or gestural, to achieve a 
more natural human-machine interaction. This chapter will also focus on provid- 
ing examples and use cases in fields related to multimodal interaction in manufac- 
turing, such as augmented reality. Accordingly, the chapter will present the use of 
Artificial Intelligence and Multimodal Human Machine Interaction in the context 


of STAR applications. 
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4.1 Introduction 


Since the beginning of the 20th century, automation played a fundamental role 
in the manufacturing industry (Wang, 2019). Starting from the sixties, robots 
were introduced in factories speeding up the manufacturing process. Initially there 
were strict boundaries between robots’ and humans’ work-spaces. In order to 
avoid injuries, workers were not allowed to enter in the robots’ working space. 
Unfortunately, this rigid organization has its limitations. Both robots and humans 
excel in different areas and a proper collaboration between them can result in 
a more efficient assembling process. Robots are faster, stronger and more pre- 
cise in repetitive assembling tasks, while humans are better in decision making 
and they can easily adapt to unexpected situations. The exponential improve- 
ments achieved in the 21st century in AI, perception algorithms and robot con- 
trol, gradually allowed for a shared work-space between human workers and 
robots. 

Robots use on-board and external sensors to be aware of their surrounding 
environment. The data output of sensors range from simple single dimension data 
(contact sensors, ultrasonic distance sensors) to complex high dimensions data 
(microphones, lidar sensors, RGB cameras, depth cameras). In order to have a better 
interaction with human workers and other machines in the factory, robots need to 
merge the information received by every kind of sensor available. This multimodal 
interaction exists in both ways: while interacting with robots, human workers must 
not be limited to a restricted group of modalities and devices (keyboards, mouse, 
screen), but they should be able to use all the modalities made available by their 
bodies (speech, vision, gestures, touch). 

The goal of this chapter is to present the various types of multimodal interaction 
in industrial environments. After introducing the problem of multimodal interac- 
tion we will present some examples of modalities for a natural interaction between 
human workers and robots/machines such as speech (intended also as Natural Lan- 
guage Processing of text obtained using speech-to-text tools) and vision. We will 
then make a step further to the idea of multimodal interaction introducing the con- 
cept of Extended Reality (XR), where a human is able to remotely control a robot 
sharing its sensory stimuli. 

More specifically, the remainder of this chapter is organized as it follows. 
Section 4.2 includes all the possible kinds of multimodal interaction between 
humans and machines. Section 4.3 describes how NLP techniques can be employed 
within the manufacturing domain. Section 4.4 illustrates human motion recog- 
nition and prediction for human robot interaction in manufacturing industry. 
Section 4.5 illustrates XR technologies which include augmented, mixed and 
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virtual reality. Moreover, a use case showing the application of virtual reality to 
remote-control a humanoid robot within the manufacturing domain is presented 
as well. Finally, Section 4.6 concludes the paper. 


4.2 Multimodal Interaction 


Since the presentation of the famous Put-That-There (Bolt, 1980), innumerable 
papers have been written about the advantages and disadvantages, problems and 
solutions aroused from the natural interaction between humans and machines. 
Multimodal Interaction discipline is based on the idea that human communica- 
tion is multimodal. Thus, if hoping to interact with machines in the same way as it 
is done with humans, the interaction must not be limited to a group of modalities 
and devices, as it has been done until now, using mainly keyboard and mouse as 
data input and graphical representations as data output. 

Some authors (Waibel et al., 1996) point out that it is not advisable to reduce 
the interaction exclusively to human © machine. They classify the multimodal 
interaction interfaces in five different classes: 


e Human — Machine: in an unidirectional way, as data input mode. For exam- 
ple, a user dictating a text to the computer or giving orders to a robot (without 
receiving any complex feedback). 

e Human © Machine: in a bidirectional and interactive way between the 
human and the machine, like, for example, in a route planner. 

e Human <> Multimedia Data: as the extraction of data from multimedia 
information. For example, the extraction of meaningful images and the tran- 
scription of text from video-recorded news, for the subsequent search by a 
human. 

e Human © Machine <> Human: where the machine mediates in the interac- 
tion between two humans that do not have the same knowledge, lack part of 
the context or simply because they are far from each other and cannot interact 
directly. 

e Human © Human (observed and assisted by machine): it is not mediated by 
a machine, but there exists one for assisting the user. For example, a system 
that records and transcribes meetings which can be searched later looking for 
actions defined in previous meetings. 


In (Alonso and Torres, 2010) the authors extended the list to support a new 
category: Human <> Multiple Machines, where the user interacts in a multimodal 
way with a group of programmable machines, such as robots, using different media 
and devices, and collaborates with all of them. 
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This theoretical classification, essential to understand multimodal interaction, 
is somewhat diluted in practice, especially after the emergence of XR technologies. 
Anyway, the six classes are relevant to the manufacturing industry, and all have been 
addressed in a certain degree in the literature related to Artificial Intelligence (AI) 
and Human Machine Interaction (HMI) in recent years. 

For example, Roitberg et al. (Roitberg et al., 2015) present an interesting 
approach for improving the efficiency of Human-Robot interaction. This approach 
is based on multimodal interfaces, and is focused on the industrial environment. 
Their research is based on monitoring and interpreting human operations, using 
video depth information provided by different sensors. They use Microsoft Kinect 
v2 for skeleton tracking, Asus Xtion PRO for object tracking and Leap Motion for 
hand and finger pose tracking. 

Liu et al. (Liu et al., 2018) focus on multimodal human © robot collaboration, 
especially in repetitive and dangerous tasks. They suggest that the more modalities 
are included and fused, the more robust the collaboration will be. For this pur- 
pose, they present an architecture and a use case for operator-robot collaboration in 
which body motion recognition, hand motion recognition and speech commands 
recognition are combined. 

Concerning the use of multimodal interaction for operator training, 
(Vélaz et al., 2014) analysed the influence of four interaction technologies and 
modalities (including mouse, haptic systems and 2D and 3D position capture) for 
the learning of a procedural assembly task. Among its conclusions it is worth noting 
that the results showed that the differences between the training performed with 
these interaction technologies were not significantly different from the traditional 
training performed by the operators. 

Another significant example of multimodal interaction with multiple machines 
that could be extrapolated to the manufacturing sector is the coordination of 
multiple unmanned aerial vehicles. Several authors (e.g.: Cacace et al., 2016b, 
Cacace et al., 2016a) are working on the coordination of machines, using the infor- 
mation obtained through different modalities to solve interaction and coordination 
problems. 

The improvement of recognition thanks to multimodal interaction has been 
proven in many studies (e.g.: Kettebekov et al., 2002, Oviatt et al., 2003) where 
the benefits of multimodal HMI were demonstrated for completing the available 
information and improve the recognition ratio using supporting modalities. 


4.3 Employment of Natural Language Processing 
Within Manufacturing 


Natural Language Processing (NLP) is a subset of AI that helps identifying key ele- 
ments from human instructions, extract relevant information and process them in a 
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manner that machines can understand. Integrating NLP technologies into the sys- 
tem helps machines understand human language and mimic human behaviour. For 
example, Amazon's Echo, Microsofts Cortana and Apple’s Siri make an extensive 
use of NLP technologies to interact with the users. 

NLP technologies speed up the operation of a whole system cutting down the 
response time. Imagine a scenario where a manufacturing company hires a data 
scientist to collect and analyse all the machine readings, reporting any sort of prob- 
lems. One disadvantage to this scheme is that by the time the management reads 
the report one problem might have happened causing damage to the entire process. 
If a robot with sensors and NLP technologies embedded is employed, this might 
remotely access the machines and detect in real time any change or problem pro- 
viding an action to be executed. The robot might even communicate with users 
and accept input in natural language. Therefore, by leveraging NLP technologies, 
the middleman can be cut out while at the same time keeping the system effective. 

Within the manufacturing industry the NLP might be adopted for the following 
tasks: 


e Process Automation: The use of NLP technologies in the manufacturing 
process allows the automatic execution of repetitive tasks like paperwork and 
report analysis (e.g., Cristian et al., 2019). Besides, it benefits the workflow 
of the entire process as each employee can be focused on tasks which require 
human intervention and capabilities. Authors in (Kang et al., 2019) devel- 
oped the feedback generation method based on Constraint-based Modeling 
(CBM) coupled with NLP and domain ontology, designed to support formal 
manufacturing rule extraction. In detail, the developed method identifies the 
necessity of input text validation based on the predefined constraints and pro- 
vides the relevant feedback to help the user modify the input text, so that the 
desired rule can be extracted. 

e Inventory Management’: Analysing data about the sales of certain prod- 
ucts is essential to assess the correct decisions for a company to optimize 
and maximize profits. By leveraging NLP technologies the resulting ben- 
efits are: (1) the entire process becomes more comprehensive; (2) it is 
more difficult to incur errors related to the analysis of sales; (3) it is easier 
to analyse the manufactured products and discard those with low quality 
without affecting the supply chain and sales. On a different level, authors 
in (Vicari and Gaspari, 2020, Carta et al., 2021) have employed NLP and 
Machine Learning techniques to automatically identify patterns, sentiment 
or other elements within a text which might be correlated to the stock 
variation. 


1. https://cmr.berkeley.edu/2021/01/managing-supply-chain- risk/ 
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e Emotional Mapping: Sentiment analysis and emotion detection 
(Atzeni et al., 2018, Atzeni and Recupero, 2020) are one of the most excit- 
ing features of NLP. Early NLP systems allowed organizations to collect 
speech-to-text communication without accurately determining its full mean- 
ing. Today, NLP approaches can sort and understand the nuances and emo- 
tions in human voices and text, giving organizations unparalleled insight. 
Learning customer expectations is a very important element in manufactur- 
ing. NLP technologies permit to identify emotions and opinions of customers 
(Dridi et al., 2019, Recupero et al., 2015) and provide actions to improve 
products and the selling process. Knowing the expectations of customers is 
key to build a longer relationship and create engagement with them. 

© Operation Optimization: Furthermore, NLP technologies can be employed 
to trace the performance of equipment, identifying potential inefficiency. 
This enables a detailed monitoring of the machinery and taking measures 
to improve the overall system operability. A review of machine learning 
approaches for the optimization of production processes covers the major- 
ity of relevant literature from 2008 to 2018 dealing with machine learning 
and optimization approaches for product quality or process improvement in 
the manufacturing industry (Weichert et al., 2019). 


4.4 Human Motion Recognition and Prediction 
for Human Robot Interaction in Manufacturing 


In order to safely interact with humans, robots need to understand human inten- 
tions and predict their movements. With the ability to recognise and to predict 
human actions, industrial robots are able to avoid dangerous collisions and to 
improve collaborative work anticipating some actions (i.e. passing to the worker 
the proper tool based on the predicted worker’s action). 


4.4.1 Video Action Recognition and Prediction 


Human action recognition is a complex task that needs as much information as 
possible about the subject performing the action. RGB and depth cameras are the 
most suitable sensors for this task: a video sequence of a human performing an 
action carries information about his visual appearance, the context of the action 
and the motion of his body. 

In order to recognise human actions from images, two steps are needed: 
action representation and action classification (Kong and Fu, 2018). Tradition- 
ally, handcrafted features are used to represent the actions (Jia and Yeung, 2008, 
Yuan et al., 2016), and standard classifiers are used to recognise the action 
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(e.g. SVN, k-means). The representation of the actions can vary from low level fea- 
tures (edges, corners) to high level ones (body shape, skeletal information). Choos- 
ing the optimal handcrafted features that best suit the task of action recognition 
can be tricky. Automatically extracted features are often more robust and achieve 
better performances. The recent increase in computational power brought to the 
rise of Convolutional Neural Networks (CNNs). CNNs are a type of Deep Arti- 
ficial Neural Networks (DNNs) where for each of the several layers is applied a 
convolution between 2D weights kernels and the 2D channels of the previous 
layer. The output of each layer are 2D feature maps extracted from the previous 
layer (low level features for the initial layers and high level ones for the last layers). 
With their deep structure and with enough training data, CNNs are able to gen- 
erate features for action recognition that outperform handcrafted ones. CNNs are 
frequently used to extract features to represent actions, achieving state-of-the-art 
results (Kong and Fu, 2018, Özyer et al., 2021). 

CNNs are data driven models and one of their drawbacks is the need of big 
labelled datasets with high quality images. The following are some examples of 
popular datasets for video action recognition, for a more exhaustive list please refer 
to (Kong and Fu, 2018, Özyer et al., 2021): 


e UCF-101 (Soomro et al., 2012): One of the most used datasets for video 
action recognition. UCF-101 isa large dataset with 13,320 different YouTube 
videos from 101 categories. This dataset has high variability in camera angles, 
actors and backgrounds. 

e YouTube-8M (Abu-El-Haija et al., 2016): This is a very large multi-label 
video classification dataset (8 million videos for a total of 500K hours). 
The videos are extracted from YouTube and they are annotated with 4800 
machine-generated labels. 

e The Kinetics Human Action Video Dataset (Kay et al., 2017): This 
dataset contains 306,245 YouTube clips of 10s each. The clips are grouped 
in 400 human action classes and are taken from different YouTube videos. 

e Moments in Time (Monfort et al., 2019): A large-scale human annotated 
dataset with one million videos of 3 seconds corresponding to dynamic 
events. Each video is labeled with one among 339 different classes. 


While it is possible to recognize action from static images, they lack informa- 
tion about the motion during time. CNNs need to be extended in order to use 
the time information of video sequences. The most common approaches are the 
followings: 


© 3D CNNs: These networks are a particular type of CNNs composed by mul- 
tiple layers of 3D convolutions obtained using 3D kernels. Receiving as input 
a sequence of frames stacked in one dimension, 3D CNNs are able to extract 
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features related both to space and time. S. Ji et al. (Ji et al., 2012) used 3D 
CNNs to recognize human actions in the real-world environment of airport 
surveillance videos. The authors compared their model with the state-of-the- 
art algorithms at the time achieving superior performance. 

e Multi-stream networks: This type of architecture classifies its input merging 
together the output of several CNNs. Each CNN receives a different type 
of input. K. Simonyan and A. Zisserman (Simonyan and Zisserman, 2014) 
proposed a two-stream CNN for action recognition. The first stream received 
as input a single RGB frame, while the second stream received as input the 
multi-frame optical flow, carrying temporal information of the action. The 
authors tested the network on the UCF-101 dataset obtaining state-of-the-art 
results. 

© Recurrent neural networks (RNNs): RNNs are special artificial neural net- 
work with internal loops in the connection between layers. Their special 
structure makes them able to keep a memory of the past and to generate 
an output based on the sequence of the most recent inputs received. J. Yue- 
Hei Ng et al. (Yue-Hei Ng et al., 2015) introduced an hybrid network that 
joins together CNNs with RNNs. Their model is composed by GoogLeNet 
convolutional layers followed by 5 LSTM layers. In the paper the authors per- 
form several ablation studies on a video recognition task showing advantages 
and disadvantages of using recurrent layers. 


Video action recognition is the problem of recognising the action performed 
by a subject based on a video sequence of the entire movement. The problem 
of predicting the action performed based only on a video of an initial portion 
of the action is called action prediction. The most recent action/motion predic- 
tion systems tend to use the combination of CNNs and RNNs (Lee et al., 2017), 
better suited for the analysis of video sequences. In Human Robot Collaboration 
(HRC) scenarios, the prediction of the type of action performed by the human 
might not be enough. Often the robot needs to know the full body motion 
during the next action performed by the human in order to successfully per- 
form the collaborative task. Recently some researchers were able to predict the 
next frames of a motion based on the action to be performed and past frames 
(Finn et al., 2016, Jung et al., 2019). 

For a Robot interacting with a dynamic environment, it is of primary impor- 
tance being able to model the surroundings and to predict how the environ- 
ment evolves through time. With a faithful representation of the environment, the 
robot is able to detect unexpected behaviours and to correct its actions accord- 
ingly. This idea is borrowed from cognitive science: in the Predictive Coding 
(Rao and Ballard, 1999) cognition theory, the brain is constantly predicting the 
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sensory outcome (top-down process) and comparing it with the actual one. At 
the same time the error between predicted and actual sensory stimuli is back- 
propagated to the highest layers (bottom-up process) in order to revise and 
update the internal predictive models (a similar idea applied to robot control 
was studied under the name of Expected Perception (Barrera and Laschi, 2010, 
Cauli et al., 2016)). Jun Tani implemented on robotics platforms several models 
based on the Predictive Coding paradigm (Tani, 2016). One of the most recent 
is the Predictive Visuo-Motor Deep Dynamic Neural Network (P-VMDNN) 
(Hwang et al., 2018). This Deep-RNN model can be used both to predict the next 
RGB frames and encoders values during a motion, and to recognise an action per- 
formed by a human placed in front of the robot. 


4.4.2 Video Action Recognition and Prediction for HRC 
in Manufacturing 


In recent years we are seeing a gradual introduction of shared spaces and collabora- 
tive tasks between humans and robots in factories. Human and robotic workers can 
collaborate during the assembly process of specific components. In these scenarios, 
the robot must predict the human coworker action in order to plan its own motion. 
The application of video action recognition models to HRC in manufacturing is 
still a relatively new topic (Wang, 2019). 

The most straightforward approaches use handcrafted features to represent the 
actions. E. Coupeté et al. (Coupeté et al., 2019) extract the skeletal representation 
of the upper-torso of a worker from depth images. The sequence of skeletal position 
during a motion is given as input to an Hidden Markov Model in order to recognise 
the performed gesture. The model is tested in an assembly scenario where a worker 
and a robot collaborate to mount a mechanical piece. 

A different approach is to automatically extract the best features using a CNN. 
P. Wang et al. (Wang et al., 2018) use AlexNet to recognise specific gestures from 
a video of a worker assembling an engine. The convolutional layers extract the 
features while 3 fully connected layers classify the gesture. The architecture based 
the classification only on single frames. 

We already mentioned that single images lack information of the temporal evo- 
lution of the action. Using both RGB images and optical flow as inputs solves the 
problem. Q. Xiong et al. (Xiong et al., 2020) use the two-streams network pro- 
posed by (Simonyan and Zisserman, 2014) to recognise the actions from closeup 
videos of workers assembling engines’ parts. The network has 2 CNN branches, 
one receiving as input RGB images and the other optical flow images. Due to the 
small size of the engine block assembly dataset used in the experiment, the authors 
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apply transfer learning. They first pretrain the entire network on a bigger generic 
action dataset and then they finetune the last layers on the engine block assembly 
dataset. 

RNNs are other models able to keep temporal information of the recently seen 
frames. Z. Liu et al. (Liu et al., 2019) developed a system able to predict the next 
action performed by a worker while assembling a computer. A robot passes the 
worker the proper tool based on the predicted action. The authors use a CNN as 
feature extractor followed by an LSTM layer and a fully connected layer to classify 
the next action. The input of the system are the images from a top-down camera 
mounted above the working table. 

While a fair amount of work on action recognition in manufacturing already 
exists, the problem of human motion prediction in HRC needs to be studied in 
more details. A robot able to predict in each instant where the body of the human 
co-worker will be, can easily avoid collision, spot mistakes and make recovering 
actions. 

It is clear that CNNs are the most reliable tool for features extraction from 
videos. CNNs need a big amount of data to learn properly and be able to generalise. 
Unfortunately, not many datasets for video action recognition in factory assembly 
scenario exist (Kong and Fu, 2018, Özyer et al., 2021). New specific video datasets 
are difficult to generate and the labelling process is highly time consuming. Domain 
transfer and simulated datasets are a valid solution to the problem. M. Fabbri et al. 
(Fabbri et al., 2018) generated a big dataset for Multi-People Tracking using the 
Grand Teft Auto V game engine. Generating a simulated dataset is faster than col- 
lecting a real one and labelling is automatic. An action recognition model trained 
ona simulated dataset with high variability and realism is able to transfer the knowl- 
edge learned in simulation to the real world. 


4.5 XR in Manufacturing Industry 


XR related technologies are facilitating multimodal interaction in Industry 4.0 and 
thus enabling tangible in-site visualisations and interactions with industrial assets 
(Simões et al., 2018). 

The term XR can be considered as an umbrella for the terms augmented 
(AR), mixed (MR) and virtual (VR) reality, which differ in how much real 
and virtual content they display and the level of interactivity. As detailed in 
Alizadehsalehi eż al., 2020 VR is characterised by high virtual content and low 
interactivity, while AR is characterised by high real content and higher interactivity. 
MR lies in the middle of both, including higher levels of virtual and real content, 
and high interactivity. 
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4.5.1 Related Work of XR in Industry 


The use of XR in industry has been suggested since the early 90’s, where for exam- 
ple Thomas and David, 1992 proposed the superimposition of certain information 
on real world objects. Since that point there are hundreds of examples of XR aided 
manufacturing, Bottani and Vignali present an exhaustive list of them in their arti- 
cle “Augmented reality technology in the manufacturing industry: A review of the last 
decade” (Bottani and Vignali, 2019). 

In addition to the Boeing article (Thomas and David, 1992) already mentioned 
above, for example Karlsson et al. (Karlsson eż al., 2017) suggest an approach for 
the presentation of superimposed information, e.g. information on potential bot- 
tlenecks, that can help decision making in manufacturing. 

Workforce training is another activity where the use of XR is increasing, espe- 
cially after the rise of robotic systems and complex machines in shopfloors. For 
example. safety training is another area where multimodal interaction and XR are 
absolutely worthwhile. As detailed in Doolani et al., 2020, these systems reduce the 
risks of harm that can be caused by machines as well as damage to them, and offer 
a platform for learning-by-doing approach that can be used multiple times with- 
out worrying about the costs, availability or risks associated with the use of real 
machines. 

The possibility of remote guidance is another advantage of XR systems in the 
manufacturing environment. For example Fast-Berglund et al., 2018 validated a 
use case in which the expert uses AR to guide the novice operator in an assem- 
bly task and gives directions and corrections in case there is something wrong in 
the assembling. Their conclusion is that thanks to the AR being able to give instant 
feedback, it makes it practically impossible to do the assembly wrong and therefore 
the results are highly positive. 


4.5.2 Use Case: Virtual Reality to Remote-control a Robot 


In this section we are going to describe the work of authors in Alonso et al., 2021 
related to a general-purpose, open-source framework for teleoperating a NAO 
humanoid robot through a Virtual Reality (VR) headset. As the proposed architec- 
ture is general, it would be straightforward to replace the NAO robot with Kuka’ or 
Universal Robot,’ two well known robots used in several production environments 
around the world. The architecture presented in Alonso eż al., 2021 includes a VR 


2. hetp://www.kuka.com 


3. hetps://www.universal-robots.com 
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interface for the Oculus Rift“ using the Unity game engine to perform robot actions 
through the VR controllers and exploits the flexibility of the Robot Operating Sys- 
tem (ROS) for the control and synchronization of the robot hardware. This work 
gives ideas on potential architecture that can be employed within the manufactur- 
ing domain to allow the robots (e.g. Kuka or Universal Robot, both supported by 
ROS) to protect workers from repetitive, mundane, and dangerous tasks while also 
creating more desirable jobs such as engineering, programming, management and 
equipment maintenance. In the following we will show details of the tools used for 
their work. Let us first start giving some background information about the Unity, 
ROS and NAO software platforms. 

Unity 3D” is a game engine which supports the development of 2D and 3D 
games, Virtual and Mixed Reality experiences and simulations. 

ROS is an open-source framework for robot software whose architecture includes 
Nodes, Messages, Topics, Services, and Actions. Nodes are processes that carry out a 
computation. Messages are exchanged by nodes. A node sends a message by posting 
it on a certain topic. Services are needed by nodes that need to perform remote 
procedure calls. Actions are used to send a request to a node to perform a certain 
task for longer time and receive a reply. Then, ROS packages are a collection of 
code for easy reuse and stacks are a collection of packages that jointly offer some 
functionalities. 

The authors employed NAO as the robotic platform but, as already mentioned, 
robots such as Kuka or Universal Robots may be employed. The Kuka system soft- 
ware is the operating software containing all the basic functions needed for the 
deployment of the robot system. Kuka robots come with a control panel with a dis- 
play and axis control buttons and a 6D mouse which is used to manually move the 
robot. The control panel allows the users to view and create new and modify exist- 
ing programs. A rugged computer lies in the control cabinet communicates with 
the robot system via the Multi Function Card, which controls the real-time servo 
drive electronics. Servo position feedback is transmitted to the controller through 
the DSE-Resolver Digital Converter/RDC connection. The software includes two 
elements running on parallel — the user interface and program storage. Figure 4.1 
shows a Kuka robot palletizing food in a bakery. Universal robots consist of indus- 
trial collaborative robot arms (cobots), which are six-jointed robot arms with a very 
low weight (from 11 to 33 kilos) with a lifting ability from 3 to 16 kilos. These 


cobots can work right alongside personnel with no safety guarding, based on the 


4. https://www.oculus.com/rift/ 


5. https://unity.com/ 
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Figure 4.1. A Kuka robot palletizing food in a bakery (taken from Wikipedia). 


results of a mandatory risk assessment. The robot arm can run in two operating 
modes of the safety functions; a normal and a reduced one. A switch between safety 
settings during the cobot’s operation is also possible. Figure 4.2 shows a Universal 
Robot lifting an object. 

In their work the authors show how through the remotes and the VR headset the 
VR interface allows the teleoperation of the NAO and the recording of a movements 
sequence for later execution. During the former, the user and the robot are not in 
the same room. Therefore, the user exploits the VR interface as a source of input and 
for having a visible and understandable representation of the remote robot status. 
The recording of a movements sequence allows the user to perform a number of 
tasks and save them in certain collections. Whenever needed, they can play them 
back. 

As the ROS framework allows the development and run on different machines 
it is easier and more flexible to support both the storing and the playing of recorded 
actions of the robot. 

Figure 4.3 illustrates the architecture of the VR system developed by the authors. 
It includes three main software components (VR, ROS and Rosbridge) and two 
hardware devices (Oculus Rift and NAO). The VR Component leverages the Unity 


6. — https://www.iso.org/standard/62996.html 
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Figure 4.2. A Universal Robot lifting an object (taken from https://www.therobotreport 
.com/voith-robotics-cuts-ties-franka-emika-adds-universal-robots/). 


ROS Component 


Nao Robot 


Providers 


Action Action 
Clients Server 
ul 
Elements 


Figure 4.3. Architecture of the Virtual Reality system. Taken from Alonso et al., 2021. 


game engine for displaying the interface on the Oculus Rift. Unity has been cho- 
sen for the existing Oculus SDK that facilitates the developing process. The ROS 
component controls the robot through the VR simulation or the management of 
real hardware. It includes the ROS framework, multiple packages provided by ROS 
Nao Drivers, custom Publisher, Subscriber, Action Servers and Service Provider that 
have been implemented for supporting the VR control. The Ros Bridge is the con- 
nection between the VR and ROS components. It provides the methods for passing 
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messages between them, for managing the information serialization and deserial- 
ization, and the connection and the delivery through WebSockets. 


4.6 Conclusions 


In this chapter we have presented various types of multimodal interaction within 
the manufacturing domain. First we have introduced the classification of multi- 
modal interaction interfaces, indicating all the possible ways a user can interact 
with one or multiple machines. Then we briefly described the NLP research area 
and how it can be employed to automatically let an independent system (e.g., an 
agent or robot) to identify relevant information within the manufacturing. Next, we 
examined the ability of robots of recognising and predicting human actions by using 
cameras as sensors and deep learning as breakthrough machine learning technology. 
We continued discussing the XR related technologies (e.g., augmented, mixed, vir- 
tual reality) and how they can facilitate multimodal interaction in Industry 4.0. 
Finally, we showed an architecture of a use case where virtual reality technology has 
been adopted to remote-control a robot and how this schema can be adapted to be 
employed within the manufacturing domain. 

Secure, safe, reliable AI systems in manufacturing environments, such as those 
investigated in the STAR project, can benefit from all of these technologies in 
their goal to make systems more trusted and human-centric. As part of the STAR 
project, research will continue on Human Robot Interaction and on knowledge 
systems, that benefit from NLP techniques and are accessible through multimodal 
interaction. 
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Chapter 5 


A Review of Explainable Artificial 
Intelligence in Manufacturing 


By Georgios Sofianidis, Jože M. Rozanec, Dunja Mladenié 
and Dimosthenis Kyriazis 


The implementation of Artificial Intelligence (Al) systems in the manufacturing 
domain enable higher production efficiency, outstanding performance, and safer 
operations, leveraging powerful tools such as deep learning and reinforcement 
learning techniques. Despite the high accuracy of these models, they are mostly con- 
sidered black boxes: they are unintelligible to the human. Opaqueness affects trust 
in the system, a factor that is critical in the context of decision-making. We present 
an overview of Explainable Artificial Intelligence (XAI) techniques as a means 
of boosting the transparency of models. We analyze different metrics to evaluate 
these techniques and describe several application scenarios in the manufacturing 
domain. 
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5.1 Introduction 


The increasing digitalization of every aspect of life provides vast amounts of data, 
enabling the implementation of Artificial Intelligence (AI) models. The manufac- 
turing and process industry is not an exception to this trend. Al models play a 
significant role in many aspects of the manufacturing process. AI models drive bet- 
ter quality by enhancing quality inspection and process monitoring in production 
lines, ease reconfiguration and customization of automated part handling, fault 
diagnosis and event prediction, more agile production management, flexible pro- 
duction planning, and enabling safe collaboration between humans and cobots. 
Especially the latter is a big step towards the transition into Industry 5.0, where the 
focus is on the synergy between humans and robots and the actors are collaborators 
instead of competitors. 

AI models provide the means to automate many tasks and achieve unprecedented 
performance levels. However, in most cases, such models are opaque to the user: 
they work as black-boxes. Their predictions are mostly accurate, but no intuition 
behind the reasoning process is available to human users. Given the impact of those 
predictions on the decision-making processes, it is crucial to develop mechanisms 
and techniques to provide insights to users on such an Al model reasoning process. 
The development of such techniques and mechanisms and how those insights are 
presented has given birth to a research field of its own, known as Explainable Artifi- 
cial Intelligence (XAI). While the field of XAI can be traced back to the 1970's [44], 
it has experienced a new flourishment since the rise of modern deep learning [55]. 

Though there is no single definition of the scope of this research field, most 
authors agree it includes intrinsically interpretable models and post-hoc explain- 
ability models (the model’s capability of being explained by another interpretable 
model). Authors identify two sources of model opacity (or opaqueness) [5]: (i) the 
complexity of the formal structure of the model is beyond human comprehension, 
or alien to human reasoning, or (ii) because the inner workings of the model cannot 
be shared (e.g., being considered a trade secret). Model opaqueness can be relative 
to expert knowledge: e.g., it can be opaque to an analyst but not to the machine 
learning engineer. [32] introduced the term deep opacity to describe models whose 
opacity cannot be removed even by human experts. When presenting insights on 
the reasoning process of an AI model, the explanations should resemble a logic 
explanation [43], and take into account relevant context. [19] considers context 
has three elements related to the explainee: (i) Profile (user profile, to whom we 
present the explanation), (ii) Objective (refer to the goals of the explanation, e.g., 
are the explanations meant to improve the model, enhance trust in the system, aid 
on decision-making or foster action based on decisions made), and (iii) focus (if 
the explanation is either global or local). In local explanations, the specific point of 
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Figure 5.1. XAI taxonomy. 


interest must be considered part of the context. When the explanations aim to aid 
decision-making or take action, they should provide information regarding action- 
able features. 

XAI techniques and methods can be classified into three categories, considering 
the explainability source, the scope of the explanation, and the level of dependency 
on the forecasting model used (see Fig. 5.1). We distinguish intrinsically explain- 
able models and forecasting models that require post-hoc models to get insights 
into the forecasts reasoning process regarding the explainability source. Concern- 
ing the explanation’s scope, explanations can be global (describe the behavior of the 
whole model for the average of forecasts provided) or local (describe the model’s 
behavior for a particular forecast). Finally, regarding the dependency on the fore- 
casting model’s explanation, we distinguish model-agnostic (can be applied to any 
Al model) or model-specific techniques (can be applied only to Al models built 
with a particular algorithm or type of algorithms). 

In this chapter, we introduce the field of Explainable Artificial Intelligence, 
describing methods and techniques used to identify meaningful features driving 
forecasts, current approaches used to evaluate such models, applications and use 
cases in the industrial domain, and open challenges. When doing so, we do not 
consider intrinsically explainable models. 


5.2 Methods and Techniques 


Different methods and techniques have been introduced to boost the transparency 
and acceptance of AI models and different taxonomies have been proposed in liter- 
ature based on the explanation generating mechanism, the type of explanation, the 
scope of explanation, the type of model it can explain, or a combination of these fea- 
tures. [1] classified those methods into intrinsic interpretable models and post-hoc 
explanations and divided the latter to text explanations, visual explanations, local 
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explanations, explanations by example, explanations by simplification, and feature 
relevance explanations techniques. [4] introduced a categorization of explanation 
methods based on the type of explanation returned and divided them based on the 
most common data types such as tabular, image, and text. For tabular data, feature 
importance is one of the most popular types of explanation returned by local expla- 
nation methods. The explainer assigns to each feature an importance value which 
represents how much that particular feature was important for the prediction under 
analysis. The sign and magnitude of each importance value are also considered to 
understand the contribution of each feature. Similar to the above but in the field 
of image classification, saliency maps can be used as explanations. Those are mod- 
eled as matrices with the same dimensions as that of the image we want to explain, 
and each element of the matrix represents the saliency of each pixel to the fore- 
cast. Another type of explanation that can be implemented on tabular data is the 
rule-based explanation. Human readable decision rules can give the end-user an 
explanation about the reasons that lead to the final prediction. A decision or fac- 
tual or logic rule is a set of premises that lead to a specific forecast. Counterfactual 
rules are a set of rules that lead to the opposite of a specific forecast. [30] classi- 
fied XAI techniques according to the type of explanation and the scope of expla- 
nation. The three types he distinguished are model-based, attribution-based, and 
example-based explanations. In this chapter, we present some of the well-known 
explainability methods based on the taxonomy introduced by [30]. 

The class of model-based explanations include methods that are either explainable 
by nature (intrinsic explainability) or methods that use a different interpretable 
model to explain the task model (post-hoc explainability). The first subclass can 
be divided into sparse linear classifiers (e.g., linear or logistic regression, general- 
ized additive models (GAMs)), discretization methods (e.g., rule-based learners, 
decision trees), and example-based models (e.g., K-nearest neighbors). The sec- 
ond subclass includes interpretable surrogate models that can approximate the task 
model and can be used as post-hoc explanations. 

The class of attribution-based explanations use the explanatory power of input 
features to explain the task model. These approaches are also known as feature (a.k.a 
variable) importance, relevance, or influence methods. Most post-hoc explanations 
fall under this category which can further be divided into perturbation-based and 
backpropagation-based methods. 

Among the perturbation-based methods, we can find the Prediction Difference 
Analysis (PDA) [40], which is based on the idea that the relevance of an input feature 
concerning the class can be estimated by measuring how the predictions change if 
this particular feature is removed. This method cannot deal with saturated classifiers 
(models whose output does not change after removing part of the features). A sim- 
ilar approach for images was developed by [60] with the Deconvolutional Networks, 
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which attempts to reconstruct the feature map into the layer input or the origi- 
nal image. The proposed networks used convolution, max-pooling layers, and the 
ReLU activation function. Sliding a gray-color square over the image, they mea- 
sure changes in feature activations and the classification scores. A variation of this 
method was developed by [11], who, instead of using a gray-square, replaces regions 
of an image with constant values, noise, or performs some blurring on the image. 
This method was evolved by [35], who chose upsampled, random binary masks 
to perform the occlusions and analyzed their impact on the target class classifica- 
tion score. Another variation of [60] was introduced by [63], who removed several 
features at once by using prior knowledge about images and choosing patches of 
connected pixels as feature sets to analyze the effects of different window sizes on top 
scoring classes. The huge computational cost of this method was later minimized 
by [13] through the Contextual Prediction Difference Analysis, which also solved the 
problem of saturated classifiers by producing a model-aware saliency map. 
Another family of explainability methods computes feature attributions from a 
forward or backward pass through the network. They require architectural or back- 
propagation rule modifications or access to intermediate layers. However, most of 
these methods have lower computational costs than the ones mentioned above, 
leading to faster results. One of the first approaches of this kind was introduced 
by [47], who computed feature attributions by taking the partial derivative of the 
output class with respect to the input. The resulting absolute values allow identi- 
fying which input features can be perturbed the least for the output to change the 
most. A drawback of this method is that it is noisy, and the absolute value of the 
gradients prevents the detection of positive and negative evidence in the input. This 
approach was improved by the Gradient * Input method [46], which increases the 
sharpness of attribution maps by taking the signed partial derivatives of the output 
with respect to the input and multiplying feature-wise by the input itself. The mul- 
tiplication with the input indicates the interest in the salience rather than sensitivity. 
[46] introduced the Deep Learning Important Fea Tures (DeepLIFT) method, which 
uses a derivative-based method to propagate activation differences instead of gradi- 
ents through the network. The intuition behind the method is that though the par- 
tial derivatives do not explain a single decision, they indicate what change in the 
image could make a change in the prediction. In the same line, [53] developed the 
Integrated Gradients approach, which relies on the idea of computing attributions 
by multiplying the input variable element-wise with the average partial derivative, 
as the input varies from a baseline to its final value. Smooth-Grad [49]takes a differ- 
ent approach, and focuses on local sensitivity, and calculates averaging maps with 
a smoothing effect made from several small perturbations of an input image. The 
effect is enhanced by further training with these noisy images. Finally, it sharp- 
ens the sensitivity maps, to increase their quality. [60] was evolved by [52], who 
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proposed the A// Convolutional Net, as an alternative that replaces the max-pooling 
layer for convolutional layers with an increased stride. A slightly different approach 
was proposed by [61], who introduced the Class Activation Mapping (CAM). This 
method relies on the observation that some convolutional layers behave as unsu- 
pervised object detectors, and it uses global average pooling to create heat maps of a 
pre-softmax layer. The heat maps point out the regions of an image that are respon- 
sible for a prediction. Gradient-weighted Class Activation Mapping (Grad CAM) [45] 
uses the gradient information to understand how strongly does each neuron activate 
in the last convolutional layer of the neural network. The localizations are com- 
bined with existing high-resolution visualizations to obtain high-resolution class- 
discriminative guided visualizations as saliency masks. The CAM and GradCAM 
approaches inspired the GradCAM++ method [6], which combines the positive 
partial derivatives of feature maps of a rear convolutional layer with a weighted spe- 
cial class score to explain the occurrence of multiple object instances in an image. 
Layer Wise Relevance Propagation (LRP) [3] is a gradient method suffering from 
vanishing gradient problems. The main idea behind this is the decomposition of 
the prediction function as a sum of layer-wise relevance values. The prediction is 
redistributed backward using local redistribution rules until assigning a relevance 
score to each input feature. There are different variations of the LRP algorithm 
based on the backward redistribution rule. 

Many explainability methods were built, relying on surrogate models to provide 
explanations regarding the reference model. One of such methods is TREPAN [7] 
which provides heuristics to issue queries against neural networks and create a deci- 
sion tree that approximates forecasts from the given network, while providing an 
interpretable set of rules that explain the forecast. A more general approach was pre- 
sented in the Local Interpretable Model-agnostic Explanations (LIME) [38], which 
can explain the predictions of any Al model through a post-hoc, local, linear, and 
interpretable model. The model attempts to learn a particular forecast, by match- 
ing the given feature vector and perturbed inputs, to the results obtained from the 
reference model. Since the creation of LIME, multiple variants were developed. 
k-LIME ([16]) uses local generalized linear model surrogates to explain the predic- 
tions, while local regions are defined by k clusters instead of perturbed samples. The 
criteria to define the value of k is to K is that predictions from the local generalized 
linear models maximize R?. In addition to this, a global surrogate linear general- 
ized model is trained to provide information about overall feature average trends. 
DLIME ([58]) proposes a deterministic version of LIME, where instead of random 
perturbations, they apply agglomerative hierarchical clustering to group the train- 
ing data. The hierarchical clustering does not require prior knowledge regarding 
clusters. A dendrogram is cut where the gap is the largest between two successive 
groups to determine the number of clusters. A k-Nearest Neighbour classifier is 
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trained to classify new instances into those clusters based on the clusters obtained. 
All data points belonging to a given cluster are used to train a linear model, which 
provides deterministic and consistent local explanations. L/MEtree ([50]) follows 
a similar approach to LIME, building a regression tree as surrogate model. The 
regression tree enables capturing non-linear relationships between the interpretable 
features and the target variable. At the same time, it does not require independence 
between interpretable features. The authors consider the model’s biggest advantage 
is providing personalized counterfactual explanations through an interactive inter- 
face that enables imposing certain conditions on the sample of interest. Inspired 
in LIME, [9] developed STREAK, an interpretability method for neural networks 
conceived as a set function maximization, achieving similar accuracy than LIME, 
while having a faster runtime execution. A slightly different approach is presented 
in Anchors [39], where a set of rules replaces the surrogate model. Since the local 
behavior of a model can be highly non-linear, the authors propose using a set of 
if-then rules, which are intuitive and easy to understand. To explore the model’s 
behavior in the perturbation space, the authors apply multi-armed bandits to incre- 
mentally construct the rules, generate candidate predicates, and choose the one with 
the highest precision until a given precision threshold is reached with a high prob- 
ability. LoRE - Local Rule-Based Explanations [14] proposes a parameter-free, two 
step method that also provides rule-based explanations. First, it creates a balanced 
set of neighbor instances using a genetic algorithm to explore the decision boundary 
of the data point of interest. Then it builds a decision tree classifier, which enables 
to derive decision rules and counterfactuals. Local Foil Trees [54] specifically deal 
with generating counterfactual explanations. To that end, they consider two possi- 
ble outputs: the model forecast (fact), and the desired label (foil). A decision tree is 
then built based on the local dataset. The rules are computed from the difference 
between paths regarding the “fact leaf”, and ‘foil leaf”. 

While most explainability methods based on surrogate models provide specific 
techniques, [17] developed a framework that enabled comparing surrogate mod- 
els on three dimensions: data sampling, explanation generation, and interaction. 
[51] considered a slightly different approach and developed an algorithmic frame- 
work (6LIMEy — build LIME yourself) that enables building custom local surro- 
gate explainers for model predictions, considering three dimensions: data sampling, 
explanation generation, and interpretable representation. 

Another local-agnostic explanation method is SHAP [28] which stands for 
SHapley Additive exPlanations and can be used to produce several explanation 
models. These models compute SHAP values: a unified measure of feature impor- 
tance based on the Shapley values, a concept from cooperative game theory. The 
different explanation models proposed by SHAP differ on how they approximate 
the computation of the SHAP values. The explanation models provided by SHAP 
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are called additive feature attribution methods. The construction of the SHAP values 
allows to employ them both locally, in which each observation gets its own set of 
SHAP values, and globally, by exploiting collective SHAP values. 

In the image classification field, two explanators can be implemented for deep 
networks: DEEP-SHAP and GRAD-SHAP. DEEP-SHAP is a high-speed approx- 
imation algorithm for shap values in deep learning models that connect with the 
DeepLift algorithm. The implementation is different from the original DeepLift by 
using a baseline distribution of background samples instead of a single value and 
using Shapley equations to linearise non-linear components of the black-box such 
as max, softmax, products, divisions. GRAD-SHAP, instead, is based on IntGrad 
and SmoothGrad algorithms. IntGrad values are a bit different from SHAP values, 
and require a single reference value to integrate from. As an adaptation to approxi- 
mate SHAP values, GRAD-SHAP reformulates the integral as an expectation and 
combines that expectation with sampling reference values from the background 
dataset as done in SmoothGrad. 

Another family of explainability techniques is that of example-based explanations. 
Methods in this class explain the task model by selecting particular instances from 
the dataset that describe the model or by creating new instances. Instances that are 
well predicted by the forecasting model (prototypes) and instances that are not well 
predicted by the model (criticism) are the influential instances for the model param- 
eters or output, while counterfactual explanations indicate the required changes in 
the input side that will have significant changes (e.g., reverse the prediction) in the 
prediction/output. [21] proposed a methodology named MMD-CRITIC to learn 
prototypes and criticisms for a given dataset using the maximum mean discrepancy 
(MMD) as a measure of similarity. [36] introduced MAPLE. This post-hoc local 
agnostic explanation method can also be used as a transparent model due to its inter- 
nal structure. It combines random forests with feature selection methods to return 
feature importance-based explanations. DICE which stands for Diverse Counter- 
factual Explanations [31] is a local, post-hoc and agnostic method that solves an 
optimization problem with several constraints to ensure feasibility and diversity 
when returning counterfactuals. Feasibility is critical in the context of counterfac- 
tuals since it allows avoiding examples that are unfeasible. 

We classify the aforementioned methods according to multiple criteria in 


Table 5.1. 


5.3 Evaluation Measures 


Explainability is considered a subjective concept. [30] considers that an AI sys- 
tem is explainable if either the model is intrinsically interpretable or if the 
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Table 5.1. Classification of XAI techniques. 


Explanation Model Attribution Example Local (L)/ Agnostic (A)/ Data 

Technique Reference Based Based Based Global (G) Specific (S) Type 

All Convolutional Net 52 X X L S IMAGE 

Anchors [39 X L/G A TABULAR/ 
TEXT 

Class Activation 61 X L S IMAGE 

Mapping (CAM) 

Contextual Prediction 11 X L S IMAGE 

Difference Analysis 

Deconvolutional 60 X X L S IMAGE 

Networks 

Deep Learning 46 X È S ANY 

Important FeaTures 

(DeepLIFT) 

DICE 31 X L A ANY 

DLIME [58 X X L A ANY 

GradCAM++ [6] X L S IMAGE 

Gradient 47 X L S ANY 

Gradient * Input 46 X L S ANY 

Gradient Weighted 45 X L S IMAGE 

Class Activation 

Mapping (GradCAM) 

Integrated Gradients 53 X L S ANY 

k-LIME 16 X X L A ANY 

Layer Wise Relevance [3] X L A ANY 

Propagation (LRP) 

LIME 38 X X L A ANY 

LIMETree 50 X X L A TAB 

Local Foil Trees 54 X X L A TABULAR 

LoRE 14 X L A TABULAR 

MAPLE 36 X X L A TABULAR 

Meaningfull 11 X L S IMAGE 

Perturbation 

MMD-CRITIC 21 x G A ANY 

Prediction Difference [40, 63] X L S IMAGE 

Analysis (PDA) 

RISE [35 X L S IMAGE 

SHAP [28 X L/G A ANY 

Smooth Grad [49 X S IMAGE 

STREAK [9] A IMAGE 

TREPAN [7] X G S TABULAR 
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non-interpretable model can be complemented with an interpretable and faith- 
ful explanation. While the XAI techniques provide different kinds of information, 
the perceived quality of the explanations depends on the users, the domain, the 
information of interest, and the explanation itself. To evaluate the explanations, 
it is necessary to define different criteria of goodness for an explanation. Given 
an interpretable approximation for a reference, model [25] lists four aspects to be 
considered on evaluation: fidelity (ability to capture the reference model behav- 
ior correctly), unambiguity (ability to provide a single and deterministic ratio- 
nale to explain each data instance), interpretability (the approximation should be 
human-understandable), and interactivity. The aspect of fidelity is further elabo- 
rated by [22], who considers two properties: soundness (the extent to which each 
explanation component is truthful to the reference model) and completeness (the 
extent to which the explanation describes the reference model). [56] enumerate 
another three criteria: sensitivity, the degree of integration, and cognitive salience. 
Sensitivity is defined as the strength of the relationship of explanatory variables 
with background conditions: the weaker the relationship, the more convincing the 
explanation. The degree of integration refers to the connectedness of the explana- 
tion to a larger theoretical framework. Finally, cognitive salience is defined as the 
ease with which the rationale behind the explanation can be followed. 

The aforementioned criteria require different evaluation approaches. [8] identi- 
fied three categories of them: 


— Application-grounded evaluation: grounded in a real-world application, 
collects domain experts feedback regarding the explanations provided to 
them. 

— Human-grounded evaluation: refers to feedback obtained from experiments 
performed with lay users, when no real-world application exists in place. 

— Functionality-grounded evaluation: the evaluation is performed consider- 
ing some formal definition or criteria, that measures the explanation quality. 


To assess the explainability methods, [15] propose three tests for functionality- 
grounded evaluations: Feature Augmentation Test, Synthetic Test, and Feature 
Deduction Test. The Feature Augmentation Test considers that if the values of 
the explainable features from a specific instance are replaced by the values of those 
features from an instance with a different label (e.g., “new-label”), the classification 
outcome should be “new-label”. The Synthetic Test is based on the assumption 
that if the explainability features are accurately selected, new synthetic instances 
can be created by preserving the explainability feature values and assigning ran- 
dom values to the rest of the features without affecting the forecast outcome. 
Finally, the Feature Deduction Test considers that if the selected explainability 
features are correctly selected, removing one of them from the input should lead 
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to a different forecast. Even though this approach is frequently adopted in the lit- 
erature [11, 20, 35, 60, 63] pointed out that samples, where a subset of features 
are removed have a different data distribution than the samples the model was 
trained on, violating a key machine learning assumption. They instead propose the 
RemOve And Retrain (ROAR) approach, which for each feature deemed impor- 
tant, they replace it by a non-informative value in the train and test sets, retrain 
the model and measure the performance change. In addition to this technique, 
they propose using a random assignment of feature importance as a benchmark to 
measure the quality of explainability feature extraction techniques. 

There is currently little research regarding application and human-grounded 
evaluations [8, 62]. A popular and domain-specific method is to evaluate to create 
a heatmap regarding model sensitivity to region-based perturbations. According to 
the heatmap, the main idea behind this is that the perturbation of relevant input 
variables would lead to a decline in prediction score than the perturbation of input 
features with less importance. [22] used questionnaires with short responses and 
Likert scales. In contrast, [23] used three quantitative metrics: accuracy, response 
time, and subjective satisfaction. The authors measured accuracy and response time 
regarding the subject response to different tasks proposed in their research. Subjec- 
tive satisfaction was measured on a Likert scale for each explanation. [24] proposed 
the Human Interpretability Score (HIS — see Eq. 5.1), which constitutes an alter- 
native metric regarding the user’s response time. On the other side, there is a wider 
set of metrics reported for functionality-grounded evaluations. 


RT nax a RT mean, M), RT mean, R) < RTmax 
(5.1) 


0, if RTmean(xX, R) > RTmax 
HI S(x, R) = 


Equation 5.1: Human Interpretability Score. Measures how long it takes the user 
to predict the label assigned to certain data point, assigning a cap to the response 
time. x and R correspond to the instance and model considered. 

Among the metrics proposed by [33] we find Mutual Information, Diversity, 
Monotonicity, Non-sensitivity, and Effective complexity. Mutual Information is con- 
sidered when creating an interpretable data representation. [33] proposes measuring 
Mutual Information on two cases: (i) between the features of the original model 
and the subset of explainable features, and (ii) against the target values. Ideally, 
the number of explainable features should be reduced to maximize simplicity and 
broadness, while aiming towards keeping a high fidelity regarding the target label 
(see Eq. 5.2). 


I(x, y) = DgrL(Po,y) | Px ® Py) (5.2) 
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Equation 5.2: Mutual Information. Measures the mutual dependence between two 
random variables x and y. 

Diversity attempts to measure the degree to which a set of rules integrates to the 
explanation (see Eq. 5.3). Monotonicity considers that feature attributions should 
be monotonic. [33] proposes measuring it as the Spearman’s correlation between 
two vectors: (i) the absolute values of attributions, and (ii) the corresponding expec- 
tations. The intuition behind the Non-sensitivity metric (see Eq. 5.4) is to assess 
that the explainability method does not assign any relevance score to the features 
the model is not functionally dependent on. The authors compute it as the cardi- 
nality of the symmetric difference between features assigned zero attribution and 
the features the model does not functionally depend on. Effective complexity mea- 
sures if some explanation features can be ignored without significantly affecting the 
prediction (see Eq. 5.5). 


d (xi, xj) 
Diversity = ras 5.3 
iversity > IN; (5.3) 
xi,xjEE;xi#xj 
Equation 5.3: Diversity metric. Æ is the set of examples considered, d is a distance 
metric for the space X, while Ng corresponds to the number of examples. 


|Ao A Xol (5.4) 


Equation 5.4: Non-sensitivity. Ao represents featues with zero attribution, Xo refers 
to features on which the model is not functionally dependent on. | - | denotes the 
set cardinality, and A the symmetric set difference. 


kx = argmin,-, _y|Mx| where E(U(y*, f — My) |x*u,) < € (5.5) 


pases 


Equation 5.5: Effective Complexity. Mg denotes the set of top & features, x denotes 
features, € > O corresponds to some arbitrary tolerance, f — My is the restriction 
of the model R to non-important features, given Mx. 

The Local Approximation Accuracy was proposed by [15] to compare the deci- 
sion boundary of the surrogate model against the original one. The authors do so by 
computing the Root Mean Squared Error between the original and surrogate model 
predictions on the test samples. A similar intuition is present in the Disagreement 
metric proposed by [25]. For a classification setting, they attempt to measure the 
surrogate model fidelity by computing the disagreement between labels of the sur- 
rogate model and the original one (see Eq. 5.6). 


N 
Disagreement(R) = > [xls € D, x satisfies qi ^ Si, B(x) # ci (5.6) 


i=l 
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Equation 5.6: Disagreement metric. Quantifies the disagreement between a surro- 
gate model R and the reference forecasting model B, given a dataset D. The triplet 
(q, s, c) stands for (feature, operator, class). 

[25] propose another six metrics to evaluate forecast explanations: rule overlap, 
cover, the rule set size (see Eq. 5.7), the rule set maximum width, the number 
of descriptor sets, and feature overlap. The Rule overlap computes the overlap 
between pairs of rules defined in the surrogate model. It is expected that the lower 
the overlap, the lower the surrogate model ambiguity (see Eq. 5.8). Cover is defined 
as the number of instances that match a given rule from the surrogate model (see 
Eq. 5.9). The Maximum Width refers to the maximum width obtained from com- 
puting the width over all the elements from the surrogate model. The authors define 
an element as either rule conditions or neighborhood descriptors (see Eq. 5.10). 
The authors define the Number of Unique Descriptor Sets as the number of 
unique neighborhood descriptors provided in the surrogate model (see Eq. 5.11). 
Finally, the Feature overlap measures the features overlap between every pair of 
unique neighborhood descriptor and rule (see Eq. 5.12). 


RuleSetSize(R) = NumberOfRules(q, s, c) (5.7) 


Equation 5.7: Rule set size. R denotes the decision set. The triplet (g, s c) stands 
for (feature, operator, class). The triplets are contained in the decision set. 


N N 
RuleOverlap(R) = >. » overlap(qi A Si, qj A Sj) (5.8) 
i=l j=l, j4i 


Equation 5.8: Rule overlap. R denotes the decision set. The triplet (4, s c) stands 
for (feature, operator, value). 


cover(R) = xx € D, x satisfies qi ^ Si, wherei e1.. -N| (5.9) 
Equation 5.9: Cover. R denotes the decision set. The triplet (4, s c) stands for 


(feature, operator, value). D represents a dataset, and x and instance in such dataset. 


N 
MaximumWidth(R) = max(width(e)), e € ai U si) (5.10) 
i=l 


Equation 5.10: Maximum Width. R denotes the decision set. e represents elements, 
which can be ether rule conditions or neighborhood descriptors. 


N 
NumberOfUniqueDescriptorSets(R) = |dset(R)|, where dset(R) = Uan) 
i=1 


(5.11) 
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Equation 5.11: Number of Unique Descriptor Sets. R denotes the decision set, and 
q denotes features. 


N 
FeatureOverlap(R) = S| FeatureOverlap(q, Sj) (5.12) 


i=l 


Equation 5.12: Feature Overlap. R denotes the decision set, g denotes features in 
descriptor sets, and s denotes operators. 

A different set of metrics is considered by [37], who for tree-based models mea- 
sured the mean path length, the mean number of distinct features in a path, the 
number of nodes, and the number of nonzero features. Finally, [48] reported assess- 
ing explainability methods based on the total number of runtime operation counts 
performed by the model when computing the forecast for a given input. 


5.4 Applications, use Cases and Open Issues 


Though multiple XAI methods exist, they do not suffice by themselves to provide 
human-understandable explanations. They are built into frameworks and appli- 
cations that provide a convenient interface and additional context to achieve that 
goal. One such framework is bLIMEy [51], which decomposes surrogate models 
into three steps: interpretable data representation (transform data from the original 
to the interpretable domain), data sampling, and explanation generation. [18] fol- 
lows a similar approach and describes the IBEX (Interactive Black-box EXplanation 
system) framework with two components: an explainer that produces explanations 
based on user’s needs, and a sampling component, that selects appropriate inputs 
to create the explanation. [2] describes Al Exp/ainability 360, an extensible toolkit 
developed that provides contextual explainers based on the stage of the AI model 
development pipeline, kind of model, and explanation requirements. [34] explores 
the usage of domain knowledge encoded in an ontology improves the quality of the 
explanations. [42] explores the usage of semantic technologies to abstract relevant 
concepts encoded in the features, avoid exposing sensitive details regarding the fore- 
casting model, and provide higher-level information to the users. The authors com- 
plement model explanations with information regarding real-world events reported 
in the media that likely influenced the variables of interest. [41] developed an ontol- 
ogy to model user’s feedback based on a given forecast and provided explanations. 
[59] developed an intelligent assistant for manufacturing, which creates directive 
explanations for the users using heuristics and domain knowledge. The application 
tracks user’s implicit and explicit feedback regarding local forecast explanations, 
enabling application-grounded evaluations. 
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The integration of explainability methods into applications enables provid- 
ing relevant information regarding model forecasts to different stakeholders. For 
instance, data scientists and machine learning engineers require low-level data to 
monitor the Al model behavior, identify corner cases, and work towards a more 
accurate and robust model. On the other side, employees and supervisors require 
high-level insights that convey reasons behind the model forecasts, can interactively 
explore different “what-if” scenarios, and provide feedback regarding the explana- 
tions provided. We envision explainability methods can be useful in a wide range of 
manufacturing use cases, such as automatic defect detection (inform the user on the 
image regions influencing the decision), production planning (provide an insight 
on the cost of the opportunity given different scheduling decisions), or demand 
forecasting (provide insights why we expect demand will take place and which fac- 
tors affect the quantity estimates). 

Several explainability techniques have been implemented in the manufacturing 
domain and specifically the predictive quality management domain (Quality 4.0) 
to boost the transparency of AI deployed models. [12] used XAI techniques such as 
CAM and Contrastive gradient-based saliency maps to explain black-box classifiers 
in the area of quality welds in ultrasonically welded battery tabs. They produced 
heatmaps where they visualized several color maps to gain insights into true positive 
versus false-positive predictions. [27] implemented several XAI methods to provide 
explanations for domain experts in the area of defect classification of thin-film- 
transistor liquid-crystal display panels. Techniques such as CAM, LRP, integrated 
gradients, guided backpropagation, and SmoothGrad were implemented and visu- 
alized on a VGG-16 classification model. Based on the visualized results, LRP and 
guided backpropagation were selected as they produced well-distributed heatmaps. 
Moreover, by fitting the model into a decision tree and converting the prediction 
results into human interpretable text, the authors achieved the maximum level of 
explainability when they presented the results to domain experts for evaluation pur- 
poses. In the area of manufacturing cost estimation, [57] described a method based 
on visualization of the machining features of a 3D computer aided design model 
that are influencing the increase in manufacturing costs. For the proposed purpose, 
a 3D gradient-weighted class activation mapping as XAI method was applied. 

Cybersecurity in a transversal concern related to all smart manufacturing cases. 
XAI techniques were successfully applied in the cybersecurity domain, to support 
the exploration of model vulnerabilities [26, 29], and identify perturbed data 
samples [10]. 

In the European Horizon 2020 project STAR (Safe and Trusted Human Cen- 
tric Artificial Intelligence in Future Manufacturing Lines), XAI is used to provide 
insights on most relevant features to each forecast, explore model vulnerabilities 
and help identify potential data poisoning. While providing accurate explanations 
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to forecasts provides the users additional elements for decision-making, the vul- 
nerabilities assessment and early data poisoning identification ensures the system is 
secure, enhancing users trust in the system. 


5.5 Conclusion 


The new industrial revolution relies on Al to enable higher production efficiency, 
and safer operations. XAI techniques provide means to reduce black-box models 
opaqueness, and increase trust in the system. In this contribution, we introduce 
the field of XAI. We list several taxonomies found in the literature alongside state- 
of-the-art methods and techniques to interpret AI models. We also include metrics 
with different qualitative and quantitative characteristics as a means of evaluating 
the above methods. Finally, we list applications of XAI, describe several use cases in 
the manufacturing domain, and open opportunities. 

XAI requires a multi-disciplinary approach. Special consideration needs to be 
given to understand how domain experts and end-users operate. Users must be 
involved in the XAI outcomes validation. The integration of XAI into manu- 
facturing processes will be paramount for the transition into the fifth industrial 
revolution. 
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Confidence Assessment of Al Models 
in Simulated Industrial Environments 


By Spyros Theodoropoulos, Dimitrios Dardanis, Georgios Sofianidis, 
Jože M. RoZanec, Panagiotis Tsanakas and Dimosthenis Kyriazis 


The deployment of artificial intelligence (AI) solutions in simulated industrial envi- 
ronments, such as manufacturing production lines, minimizes the risks of physical 
damage caused by potential agent errors or malfunctions. Leveraging synthetic data 
generation and data augmentation techniques can increase the accuracy and robust- 
ness of an AI solution. To that end, artificially generated adversarial scenarios can 
be exploited to assess an Al agent’s confidence level and quality. This chapter will 
present the state-of-the-art techniques that aim to increase the confidence assess- 
ment of manufacturing focused AI agents by spanning the fields of Reinforcement 
Learning, Explainable AI and Visual Analytics. 
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6.1 Introduction 


At the heart of Al-powered Industry 4.0 systems lie modern machine learning 
(ML) methods such as Deep Learning and Reinforcement Learning. Such algo- 
rithms consume large amounts of data available through smart factory IoT sensors 
to support automated decision-making, process optimization and achieve improved 
working conditions for human workers. 

However, these algorithms often exhibit complex stochastic behavior that can be 
difficult or impossible for humans to understand. On the other hand, manufactur- 
ing is a domain where the risk of a mistaken action can affect the well-being of the 
human worker, the integrity of the production process, or the quality of the end 
product. Therefore transparency, safety, and trustworthiness are fundamental prop- 
erties that should be considered when designing AI models that interact physically 
with the shop floor. 

Another important aspect is that as these models grow in complexity and sophis- 
tication, so does their need for larger sets of training data. These sets might be 
smaller than required in practice, suffer from poor quality, or misrepresent the 
actual real-life domain of the problem modeled. 

A simulation is a valuable tool for countering those issues, as it can augment 
the available input data, speed up the algorithm’s training process, and help with 
its subsequent validation and robustification by producing original samples and 
scenarios. These can be used to enhance the algorithm’s predictability and trust- 
worthiness, especially when combined with confidence assessment methods and 
transparency enhancing techniques such as XAI and Visual Analytics. Their use in 
real-life Al deployments becomes necessary, as, despite their impressive results, algo- 
rithms based on Deep Learning and Reinforcement Learning still contain hidden 
aspects that humans cannot completely understand or control. 

Confidence assessment techniques aim to make sure that an Al-powered 
machine will not act in a completely unpredictable way, endangering human work- 
ers or derailing the production process. We have seen XAI as a remedy providing 
us with understandable explanations of AI decisions. Another way is to try and 
mathematically quantify the confidence in an algorithm's decision or prediction. 
Such quantification can be achieved by modifying the algorithm to keep track of 
its confidence or by applying another Al algorithm or a statistical method over the 
outputs. The confidence assessment is a crucial step in larger model pipelines, as it 
determines whether a fallback method should be used or whether human interven- 
tion is necessary through human-in-the-loop and active learning techniques. 

Our main focus in this chapter will center around well-studied manufacturing 
use-cases that are also predominant in the STAR H2020 project, namely defect 
detection and robotic pick-and-place tasks. 
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6.2 Simulated Reality 


A Short Primer on Few-shot Supervised Learning 
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Figure 6.1. Categories of few-shot learning [1]. 


In the more straightforward case of supervised learning, the reality we are try- 
ing to simulate is the distribution generating the input images. Even though some 
classes of samples might be underrepresented, we can use the information we 
already have and transform or synthesize it to generate new samples. This process, 
namely data augmentation, is part of a ML sub-field called few-shot learning. 

Few-shot learning (FSL) [1] is a set of techniques aimed at reducing the amount 
of training data needed for an algorithm and therefore tangent to Simulated Reality. 
There are three areas to address this problem: the input data, the model, and the 
optimization algorithm, as seen in Figure 6.1. 

H is the hypothesis space or space of the family of models (e.g., all CNNs of 
a specific architecture). The optimization algorithm moves through this space by 
learning better and better parameters moving from the beginning to the learned 
hypothesis / (note that h; depends on the training dataset), representing the final 
learned parameters. es; is the estimation error due to learning inefficiency (e.g., 
overfitting) and £app the approximation error, due to the limited capacity of the 
hypothesis space. What FSL is trying to do is bring the “start” point closer to h* 
faster than some model training that would require an extensive samples collection. 
For example, model-based techniques such as transfer learning try to constrict H 
to H, a smaller hypothesis space learned from another similar problem with a high 
chance of including h*. On the other hand, the “algorithm” category tries to use 
prior knowledge over the learning rate and direction of the optimizer to decrease the 
number of model updates. With data augmentation, which is our main focus, we 
are trying to improve the accuracy gained by the model by synthesizing additional 
samples and bringing the final stage h; of the training closer to h*. 
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6.21 Data Augmentation in Visual Defect Detection 


In a use case such as visual quality inspection, a data augmentation approach is often 
necessary as defects rarely occur in manufacturing, and several classes of defects can 
be severely underrepresented. It is even more critical in the case when we want to 
check the robustness of an algorithm against previously unknown defect types or 
known occurrences that largely deviate from the training samples. There are various 
methods in the literature. The more traditional ones are based on image processing 
transformations. At the same time, more sophisticated methods attempt to create 
synthetic data using Variational Autoencoders, Generative Adversarial Networks 
(GANs) and Neural Style Transfer (NST). 

The traditional approach to image data augmentation generates image data 
through various transformations, such as scaling, rotation, translation, shearing, 
blur, or illumination. However, those image-level transformations do not con- 
tribute sufficiently to the clearer separation between different classes, especially 
when the separation depends on higher level features [2]. To overcome the limita- 
tions of traditional image processing methods, Convolutional Variational Autoen- 
coders (CVAEs) have been proposed. Those consist of two CNNs: the encoder, 
which maps the input image to a latent space of lower dimensionality, and the 
decoder, which generates a new varying reconstruction of the original image from 
the latent features. CVAEs have been used successfully in [3] to augment underpop- 
ulated defect classes on a dataset of metal surfaces. The classification output from 
a six-class CNN over the augmented dataset ended up having nearly perfect pre- 
cision and recall scores, with a significant improvement over the pipeline without 
the CVAE. 

Generative Adversarial Networks (GANs) are another important tool, especially 
for restoring balance in skewed datasets through synthetic image generation. They 
can efficiently address different kinds of imbalances such as inter-class, intra-class 
(e.g., person re-identification), object and pixel-level imbalances for segmentation 
tasks [4]. A concrete use-case for defect detection is presented in [5] using Wasser- 
stein GANs. The method is used to detect burn-through and crack defects on weld- 
ing joints which were found difficult to generate with traditional image processing 
methods. The data augmentation framework consists of two rival networks: a gen- 
erator that generates fake images from random noise and a discriminator network 
tasked to distinguish between real and fake images. The original purpose of the gen- 
erator network is to deceive the discriminator. However, here, it is also re-purposed 
to generate plausible, high-quality defect images. The generator consisted of six 
deconvolutional layers, with ReLU (and tanh for the last layer). The discriminator 
had four convolutional layers with leaky ReLU and produced up to seven defect 
classes. The synthetic data helped the network perform well with less than 104 
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original samples, producing misclassifications only between defect types and per- 
fectly separating the “normal” class. 

Neural Style Transfer attempts to fuse two images: the “style” image and the 
other “content” image. Starting from a random noise image, it tries to minimize 
the two losses for style and content simultaneously. While initial methods imple- 
mented only global style transfer, [6] uses the technique described in [7] to fuse 
defects only with local regions of the content image. The local fusion algorithm 
iterates over two steps, first using a patch-match method to find a patch in the style 
area similar to one in the content image area for replacement. Then, further training 
the network using histogram and variational loss functions improves smoothness 
on patch boundaries. The generated images were finally given as input to a seg- 
mentation network to detect defects in buttons and showed quite promising results 
against the vanilla segmentation and other generators based on CycleGAN [8] and 
histogram matching. 


6.2.2 Simulated Reality in Reinforcement Learning 
for Robotic Control 


In reinforcement learning (RL), simulation is of particular importance as a remedy 
for the sample complexity problem. RL algorithms, depending on the complex- 
ity of the domain, need many episodes of trial and error to learn efficient policies. 
These episodes can be expensive and risky to obtain, especially in high-risk real- 
world settings such as manufacturing sites, often making the use of simulation a 
necessity. Additionally, in an artificial simulation environment, forbiddingly risky 
policies can also be explored to help guarantee robustness or discover new, improved 
policies that would be inaccessible by sticking to a more conservative policy. Apart 
from sample complexity, simulation can also help with the transfer of knowledge. 
Learning in an abstracted environment that only retains the necessary elements for 
a particular task could make the learned policies more generalizable and adjustable 
to different settings (e.g., part handling of different parts or different production 
lines). Finally, simulation can provide an additional layer of safety where different 
possible consequences of an action can be observed and their results validated to 
avoid real-world accidents. Next, we will focus on approaches to bridge the sim- 
to-real gap mainly applied to the well-studied area of robotic grasping. This issue 
appears due to the accumulation of minor errors caused by the unavoidable inac- 
curacies in the simulation’s physics model and visual rendering. 

Domain adaptation is a set of techniques that help a learning model generalize 
to a target domain while trained with samples from different sources. In robotic 
grasping, simulation is the source domain, and the actual production line is the 
target. Domain adaptation is widely used in computer vision. It can be roughly 
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distinguished into two categories: feature-level and pixel-level. Feature-level is usu- 
ally based on adaptive feature extraction methods such as CNNs, which already 
have some degree of transferability between the simulation and reality domains. 
Including a domain-level similarity metric in their loss function, such as maximum 
mean discrepancy, can help enforce domain invariance when retraining in the new 
domain [9]. Pixel-level domain adaptation is mainly based on using GANs to restyle 
simulation images so that they look more similar to real ones [8]. Both of the above 
techniques can work well on Deep Reinforcement Learning algorithms that base 
their perception and action planning on CNNs. A good example is GraspGAN 
[10] which uses simulation with a hybrid adaptation method, combining Domain 
Adaptation Neural Networks (DANNs) with a novel batch-normalization tech- 
nique. The proposed method achieved comparable or better performance to vanilla 
Deep RL with fifty times fewer real-world samples. 

Domain randomization methods have also shown good results for the task of 
robotic grasping making simulation-only training feasible. The goal is to train the 
agent on a broader set of environmental conditions by introducing randomization 
in the simulated environment at training time. Given that the variability of the con- 
ditions is sufficient, the model trained in the simulation will be able to generalize in 
the real world. For instance, [11] uses randomization on the following types of fea- 
tures: addition of distracting objects of different shapes and sizes, object position 
and texture, the texture of background objects, camera position, orientation and 
field of view, number and position of lights and addition of different types of ran- 
dom noise. The trained model produced comparable results to real-world training, 
even though no real-world data was used. 

A third approach is outlined in [12] where mapping is learned that can trans- 
late real-world images to their simulated equivalent. Domain randomization is uti- 
lized in this context to create the pairs of inputs and labels for the training of 
the translating network, called a Randomized-to-Canonical Adaptation Network 
(RCAN) [13]. The non-randomized simulated representation is referenced as the 
canonical representation and used to train the grasping algorithm. The RCAN is 
used during the real-world operation to translate real-world images into the canon- 
ical representation that the RL algorithm understands. A pipeline with RCAN and 
QT-Opt [14], a recent RL algorithm, manages to learn how to grasp previously 
unknown objects with high accuracy and a hundred times fewer episodes by train- 
ing in the simulator only. 

Finally, for the above methods to work, it is important to have an appropriate 
simulation environment. There are various open-source off-the-shelf options such 
as Gazebo [15], OpenSim [16], MuJoCo [17] and Bullet [18]. Choosing the right 
tool is very dependent on how much its features fit the requirements of the task at 
hand. A recent survey [19] seems to favor MuJoCo for the tasks of robotic grasping, 
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though a model-based approach is used in the experiments. In GraspGAN [10] 
the robotic arm simulation was based on Bullet, giving priority on the amount of 
possible environmental diversity to ease the transfer to a real-world setting. 


6.3 Confidence Assessment of Al Methods 


The previous section illustrated how simulation can assist the ML training process 
and improve the performance of AI solutions. In real-world manufacturing envi- 
ronments, the notion of performance extends to the reliability and transparency 
of an Al system. This chapter will explore methods of confidence assessment as 
a way to improve an AI solution in a real-world industrial environment. To that 
end, the notion of confidence will be examined from two different perspectives. 
The first will examine methods of evaluating an AI algorithm's prediction confi- 
dence levels. In contrast, the second will focus on combining methodologies that 
enhance human cognition by providing deeper insights into ATs inner structures 
and decision-making processes. 


6.3.1 Assessing the Confidence of Deep Neural 
Network Predictions 


As we saw in the previous sections, DNNs and especially convolutional ones, are a 
powerful learning model. This has come at a cost, however, because as the model 
complexity of neural networks grows - which also brings an increase to their test 
accuracy - so does their overconfidence in their predictions [20]. In this section, we 
focus on the problem of classification. What we refer to as confidence in a classifi- 
cation setting is the maximal value of the last softmax layer, which determines the 
class of a given input. This is compared with the accuracy of the network for a given 
class. A good way to visualize this is reliability diagrams, where for different confi- 
dence ranges, the corresponding accuracies are shown [21]. Fig. 6.2 is an example 
for the five-layer LeNet model vs. the 110-layer ResNet on the CIFAR-100 dataset. 
The ideal would be for accuracy and confidence to be identical. 

The main reason for this increasing miscalibration due to increasing model com- 
plexity is that DNNs also suffer from a more subtle case of overfitting. Namely, 
they tend to overfit the negative log-likelihood loss invisibly. In contrast, their vis- 
ible generalization accuracy measured by a 0/1 loss seems to remain stable. This is 
a sign of unreliability that has limited DNN use in real-world safety-critical appli- 
cations. 

Many methods have been proposed to mitigate this overconfidence. The first cat- 
egory tries to adjust softmax outputs as a post-processing step to resemble the actual 
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Figure 6.2. Reliability Diagram, LeNet vs. ResNet [21]. 


confidence probabilities (calibration methods) or follow an ordering where a higher 
value will correspond to higher “true” confidence. Histogram Binning [22], Isotonic 
Regression, and Bayesian Binning Quantiles (BBQ) [23] are example methods that 
solve optimization problems after the model training to bring softmax output close 
to their confidence values as estimated on a validation set. Platt Scaling [24] and its 
generalizations Matrix Weighting [20], and Temperature Scaling [25] are applied 
on the “logit” layer just before the softmax aiming to calibrate the weights of the 
final layer so that outputs are close to the validation set confidence probabilities. 
Temperature scaling is the most popular one. It does not influence the ordering of 
the class predictions guaranteeing the exact class prediction as before. 

Alternatively, suppose one is interested only in an ordering of class confidence 
estimates. In that case, estimating the approximate distance from class boundaries 
[26] is another alternative. The second category of confidence assessment meth- 
ods tries to make changes to the learning algorithm so that the training process is 
also constrained to output reasonable measures of the model’s “true” confidence. 
Most notable is the addition of a term to the loss function that penalizes ordering 
inconsistencies in the output (pseudo-)probabilities [27]. Moreover, regularization 
techniques such as dropout, weight decay, label smoothing [28] and mixup [29] 
have been shown to improve confidence estimates. 


6.3.2 Confidence and Reliability in Reinforcement Learning 


Reinforcement learning applications in a real-world setting face significant chal- 
lenges. The main difficulty lies in the fact that the agent acts autonomously. There- 
fore, any deviation from its usual way of operating will directly and potentially 
negatively impact the environment. An overview of possible Reinforcement Learn- 
ing (RL) failures in practice was given by researchers of OpenAI in [30]. The 
most common pitfalls in the current state of the art are “negative side-effects” and 
“distribution shift”. Negative side-effects occur when the agent tries to learn an 
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objective function that focuses on a specific aim but ignores other important factors 
in the environment, usually considered as “common sense” knowledge by humans. 
“Distribution shift” refers to scenarios where the agent makes catastrophic mistakes 
while trying to adjust to changes in the underlying environment. 

The easiest remedy for those problems in practical RL systems is to use a hard- 
coded reaction to potentially catastrophic events (e.g., forcing a robot to stop before 
colliding with an unexpected obstacle or keeping a drone high enough above the 
ground to avoid collision). However, as the domain complexity increases, it is 
increasingly challenging to incorporate all these scenarios beforehand. A few other 
ways of reducing exploration risks are described in [31], including filtering actions 
through reward thresholds and changing the total reward objective to include risk 
terms. An excellent example of the latter is also [32] which uses Conditional Value 
at Risk (CVaR) as a criterion for policy gradient optimization. Another alternative 
is to estimate lower confidence bound on the expected reward of a trajectory in an 
off-policy manner [33]. Finally, distribution shift can be countered by reachability 
analysis and the initial adoption of conservative policies, e.g., through robust policy 
improvement [34] or imitation learning [35]. 

Even if different improvements are added for making RL agents more risk-aware, 
it is still beneficial to extract confidence estimates from outside the agent, right 
between its choice for the best next action and the action’s execution in the real 
world. This can be achieved by anomaly detection techniques used for active learn- 
ing [36, 37]. The inclusion of a human in the AI loop seems quite attractive for 
implementing real-life RL systems. However, such human intervention might be 
time-consuming or even impossible if the agent operates in tiny time scales. Human 
Intervention Reinforcement Learning (HIRL) [38] tries to solve this issue by com- 
plementing human intervention with AI. The human initially monitors the RL 
agent's decisions. If one of them is catastrophic, it replaces it with a safe action and 
assigns it a large negative reward. A classifier, called a Blocker, monitors human inter- 
vention, learning to imitate the human’s blocking behavior. After a sufficiently long 
amount of time (and samples), the Blocker takes over the RL agent’s monitoring. 
This approach has shown to be very effective against catastrophic forgetting.' As the 
RL agent executes its policies for many episodes without taking any catastrophic 
actions, its value estimation becomes more and more optimistic, underestimating 
the negative value of those actions. The Blocker, trained from the human operator 
with a more constant perspective across time, can be there and remind the RL agent 
of forgotten pitfalls. 


1. Catastrophic forgetting describes the fact that an artificial neural network completely and abruptly forgets 
what was previously learned upon learning new information. 
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6.3.3 Visual Analytics and XAI Methods as a Way 
to “Peer Through the Black-box” 


Despite the recent increase in the research of ML models, and especially DNNs and 
Graph Neural Networks (GNNs), their internal complexity is still often referred to 
as a “black box” [39, 40]. Due to their huge numbers of hyperparameters (up to 
thousands) combined with non-linear transformations, DNNs might seem obscure 
to important manufacturing stakeholders such as factory workers or production line 
managers, thus generating an issue of trust. That is why the notion of AI decision 
“confidence” as “the belief that a choice or a proposition is correct based on the 
available evidence” [41] is of paramount importance, especially in a dynamically 
changing environment, such as a production line. Additionally, the complementary 
notions of interpretability or explainability [42, 43] are key in fostering stakeholder 
understanding and trust. The field of explainable AI (XAT) aims to address this issue 
by generating valuable insights into the inner structures of ML algorithms and is 
a rapidly growing field of research that also includes visualization techniques to 
provide clarity to domain experts. 

There is a vast amount of XAI methods described in the literature which can be 
classified based on different criteria such as (i) the complexity of interpretability, 
(ii) the scope of interpretability, and (iii) the level of dependency from the used 
AI model [44]. Complexity-related methods can be split into intrinsic explainabil- 
ity and post-hoc explainability. Intrinsic explainability is achieved by designing an 
AI model where the internal functioning is directly accessible to the user, making 
the model intrinsically interpretable (such as decision trees or linear regression). 
In contrast, post-hoc explainability accompanies the Al model by providing insights 
without knowing how the AI model works. Based on the scope of interpretability, 
global interpretability is referred to understanding the entire model behavior, and 
local interpretability is referred to understanding a single prediction. Finally, based 
on the level of dependency, post-hoc explanation methods can be split into model- 
agnostic that can be used to explain any kind of model and model-specific that are 
only suitable for specific model cases. 

Especially for Deep Neural Networks (DNNs), several practical methods that fall 
under post-hoc explainability have been introduced to make those black-box mod- 
els less opaque. [45] listed four families of explanation techniques: interpretable 
local surrogates, occlusion analysis, gradient-based techniques, and layerwise rele- 
vance propagation (LRP). Interpretable local surrogates aim to replace the decision 
function with a local surrogate model that is structured so that it is self-explanatory 
(e.g., the linear model). This approach is embodied in the LIME algorithm [46], 
which was successfully applied to DNN classifiers for images and text. Occlusion 
analysis is a perturbation-based technique where we repeatedly test the effect on 
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the neural network output of occluding patches or individual features in the input 
image. As a result, a heatmap can be built highlighting locations where the occlu- 
sion has caused the most substantial function decrease. Gradient-based techniques 
involve gradient calculations in order to produce explanations. Integrated gradi- 
ents [47] explain by integrating the gradient of the output along some trajectory in 
input space connecting some root point to the given data point. Another method is 
SmoothGrad [48] where the function's gradient is averaged over a large number of 
locations corresponding to small random perturbations of the original data point. 
Finally, the LRP method [49] makes explicit use of the layered structure of the neu- 
ral network and operates in an iterative manner to produce the explanation. First, 
activations at each layer of the neural network are computed until we reach the 
output layer. The activation score in the output layer forms the prediction. Then, 
a reverse propagation pass is applied, where the output score is progressively redis- 
tributed, layer after layer until the input variables are reached. There are many appli- 
cations of the above-listed methods in literature where XAI boosts the transparency 
and acceptance of high-performing black-box models. In the manufacturing 
domain, [50] described a post-hoc XAI analysis of deep learning for defect classifica- 
tion of Thin-film-transistor liquid-crystal display (TFT-LCD) panels. The authors 
used post-hoc techniques to produce heatmaps as explanations for a VGG network 
alongside decision trees and human interpretable rules. The results were successfully 
presented to domain experts in order to boost the acceptance of the AI model. 
Working both synergistically and in parallel with XAI is the field of Visual Ana- 
lytics (VA), which is associated with Information visualization and human inter- 
actions but also extends to other fields of Computer Science. Like [51] D. Keim 
et al. analyzed it is a field that combines Data Mining, ML, Data Management, 
and interactive visualizations in order to assist in decision-making processes for big 
heterogeneous data sets and provide useful insights to ML inner processes. The 
field of VA can be divided into two main areas: (i) data analysis/data structuring, 
which also includes Data Mining and ML techniques; (ii) interactive visualizations, 
which are responsible for data representation, explanations, and capturing complex 
human input in order to “translate” it into systems’ actions. One of the very first 
who highlighted the importance of data visualization was [52] Card et al., who 
defined information visualization as the use of computer-supported, interactive, 
visual representation of abstract data to amplify cognition. The notion of “mean- 
ing” in data representation systems is defined as the semantic information which 
can be extracted from the data representations and as semantics, the formalization 
which represents the meanings of the represented data. Nazemi et al. [53] extended 
Card’s definition of semantics visualizations to computer-aided interactive repre- 
sentations for effective exploratory search, knowledge, domain understanding, and 
decision making. [51] Keim et al. noted that a VA solution needs to be expressive, 
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effective, and appropriate; it is specified as [54] expressive if it represents exactly 
the information contained in the data (nothing more and nothing less); effective if 
it successfully represents the domain-specific and context-related information; and 
appropriate if it is beneficial in terms of cost/value ratio. 

Traditional ML algorithms which are based on batch learning tend to suffer in 
large real-world environments. Expert information is not considered in the learn- 
ing process, and therefore there is an issue of trust. [55] Wu et al. highlighted the 
importance of VA methods when applied in industrial environments since they con- 
structively involve the human factor in decision-making processes providing more 
clarity of the ML decision. Furthermore, the statistical distribution for an indus- 
trial application (e.g., sensors) should be considered sensible to changes over time, 
as well as other unforeseen factors. A sensor could be re-calibrated, and therefore, its 
outputs will vary from those that an ML algorithm was trained with, in a previous 
time window. VA techniques that focus on building the ML model’s input (pro- 
cesses before model build or after sudden statistical change), aim in assisting domain 
experts (e.g., factory workers, production line managers, etc.) to prepare their data 
and enhance human cognition in order to better comprehend the input to the ML 
algorithm. These techniques can be split in [56] two sub-categories: (i) interactively 
assessing the data quality in order to [57] generate ground truth labels from noisy- 
sourced labels; (ii) allowing domain experts to explore complicated Feature Spaces 
by adding another layer of semantic information to standard feature extraction tech- 
niques. Regarding interactive feature selection, the [58] Infuse framework suggests 
a powerful way of “correctly” identifying the most relevant features in large compli- 
cated feature spaces. VA solutions extend simple visualizations such as histograms of 
oriented gradients and allow users to interact with the underlying data. To that end, 
much research focuses on providing deeper insights to domain experts by provid- 
ing views designed to highlight ML uncertainty or alternatively by establishing an 
iterative process for overcoming this issue and proposing solutions towards Active 
Learning methods. [59] Agocs et al. defined a “visual view” as the result of user 
interaction by inserting a query into a system that returns the corresponding subset 
of data. A view is the decomposition of directed and labeled graphs into multi- 
ple directed and weighted graphs of lesser dimensions. [60] Xu et al. provided a 
Visual Analytics approach to generate insight into the software structures of large- 
scale models; [61] Sibolla et al. presented a framework to analyze and visualize 
data streams from sensor observations; [62] KagNet proposes a textual inference 
framework to answer commonsense questions by exploiting knowledge graphs; [63] 
RetainVis proposed a method to increase user understanding of Recurrent Neural 
Networks and leverage domain expertise; [64] DeepEyes is a VA framework that 
assists the creation of DNNs by generating more insight on the different layers of 
the Neural Network and the filters that are triggered in each layer; [65] Legg et al. 
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proposed a VA methodology that incorporates human input in order to increase 
the confidence levels of an ML solution in dynamically changing environments. 


6.4 Conclusion 


Simulation-based methods combined with confidence assessment methods can pro- 
vide an additional layer of safety over a deployed Al model. We saw that simu- 
lation can assist the training process and synthesize novel test cases to stress the 
model’s current capabilities. Those test cases can be evaluated through the pre- 
sented methods for supervised and reinforcement learning. They can then be used 
to inform model updates (e.g., sim2real) or be converted to a human interpretable 
form through XAI and VA to give deeper insight into the workings of the model. 

A fully integrated Simulated Reality component in an Industry 4.0 environment 
is expected to support different families of learning algorithms, most importantly 
supervised (e.g., learning visual defects from a pre-labeled dataset of defects and 
non-defects) and reinforcement learning (e.g. autonomous robotic arm pick-and- 
place). Its core modules would be an algorithm-agnostic synthetic data generator 
coupled with a confidence assessment library, usable both during model training 
and real-world model testing. Besides outputting confidence scores correspond- 
ing to model predictions or actions, the confidence assessment component will 
also include the XAI and VA sub-components to assist technical and operational 
stakeholders in understanding the model’s behavior. This will enable them to uti- 
lize their expertise in providing adjustments to improve the model’s accuracy and 
general performance. Additionally, as described in the section about sim2real trans- 
fer in reinforcement learning, capabilities for automated model updates could be 
added trying to bridge the gap between simulation and reality and incorporate new 
knowledge acquired in simulation back to the real-world model. 
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Chapter 7 


The Human-Digital Twin in the 
Manufacturing Industry: Current 
Perspectives and a Glimpse of Future 


By Elias Montini, Niko Bonomi, Fabio Daniele, Andrea Bettoni, 
Paolo Pedrazzoli, Emanuele Carpanzano and Paolo Rocco 


The need to comply with shorter product life-cycles, diversified market demands 
and increased global competitiveness is leading to a dramatic increase in production 
systems requirements in terms of flexibility and responsiveness. Industry 4.0 and its 
push for digitalisation are becoming a pervasive reality impacting almost each phase 
of the company’s life cycle, from business strategy and process design to daily opera- 
tional activities. As a resulting drawback, an increasing burden is put on the workers, 
who are requested to operate and interact with complex systems, under challenging 
conditions. Furthermore, decision-making and control systems consider humans as 
an external and unpredictable element. New technological solutions are arising to 
address such challenges. Digital Twins can be applied to represent humans in the 
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digital world, including their intents, behaviours, conditions and emotions, pro- 
viding the ground for human-aware operations and planning. This chapter aims to 
provide an overview about most recent advancements and results applying digital 
twins to support the design, the implementation and the operations of human- 
centric production systems. 


7.1 Introduction 


From its origin, the manufacturing industry followed a continuous evolution that 
allowed to reach an unprecedented level of performance to satisfy increasingly 
demanding customers. However, despite the steep technological evolution, humans 
are still the fundamental resource of any production system. 

In the manufacturing industry, more than 70% of tasks are still done man- 
ually, thus making humans accountable for generating most of the value [1]. 
Moreover, it is also expected that, by 2025, the average time spent by humans 
and machines at work will be the same as today [2]. Therefore, the human fac- 
tor is and will be an essential dimension during design, deployment and opera- 
tion of manufacturing systems. This notwithstanding, in the era of Industry 4.0, 
where Cyber-Physical Systems (CPSs) rule the roost, most systems still consider 
the human as an external and almost unpredictable element while human intents, 
physical states, characteristics and actions should be integrated into their design 
and operation [3]. The “human factor” has been often considered, with reference 
to the design of better ergonomics, so as to optimise interactions between work- 
ers, their tasks and the physical elements of the factory. However, this approach 
shows different limitations. First, it is performed offline, and it does not allow to 
dynamically adapt the production system to the current situation. Second, it con- 
siders human as an immutable entity, whose behaviours and conditions do not 
evolve in time. Finally, it takes care only of physical interactions between human 
and machines, without considering more complex relations, and the impacts that 
such relations may have one on each other and on the production system as 
a whole. 

In recent years, the awareness that this approach demonstrates several shortcom- 
ings has come to the surface. In digital representations of the factories, where more 
and more complete and realistic twins of machines and equipment are included, the 
human has been almost neglected. Nevertheless, human’s characteristics, behaviours 
and psychophysical conditions have a relevant impact on the performance and 
operations of a production system. Neglecting these elements in the digital repre- 
sentation of the production environment creates many limitations, especially con- 
sidering the high emphasis that the concepts of Industry 5.0 [4] and Operator 4.0 
brings on human factors. Therefore, in order to take the digital representation of 
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the production systems a step further, humans have to be modelled and included. 
To realise this ambitious goal, the creation of a Human Digital Twin (HDT) is a 
relevant challenge. 

In this regard, this work aims to explore recent advancements in the digital rep- 
resentation of humans in production systems. The Chapter is structured as follows. 
Section 7.2 introduces the concept of Operator 4.0 and of Human Cyber-Physical- 
Systems (H-CPSs). Section 7.3 provides a review of its main applications in the 
manufacturing industry. Section 7.4 highlights the most relevant technologies to 
realise HDT that can be applied in the manufacturing sector. Section 7.5 high- 
lights the approach adopted in the STAR project towards the HDT development 
and adoption. Finally, concluding remarks are highlighted in Section 7.6. 


7.2 The Operator 4.0 and Human 
Cyber-Physical-Systems 


Each industrial revolution has led to a deep and significant transformation in the 
way production systems, engineering processes and manufacturing products are 
designed. Unavoidably, the operators’ activities, responsibilities and nature of work 
have also undergone drastic modifications. 

The spreading of the approaches and technologies belonging to the Industry 4.0 
paradigm has radically changed the operators’ roles in the modern factory environ- 
ment leading to the definition of the Operator 4.0, known as “a smart and skilled 
operator who does not only perform cooperative work with robots but also works aided 
by machines as and if needed by means of Human-Cyber-Physical-Systems, advanced 
human-machine interaction technologies and adaptive automation towards achieving 
human-automation symbiosis work systems” [5]. In this sense, the Operator 4.0 is 
considered as a hybrid resource coming from the relationship between the human 
and machines, where the focus is, on the one hand, treating automation as a fur- 
ther extension of the human’s physical, interaction and cognitive capabilities and, 
on the other hand, considering human as a precious source of information within 
the smart production environment [6]. 

The Operator 4.0 vision involves an operator surrounded by a digital work sys- 
tem which is perfectly suited for workers with different skills, capabilities, pref- 
erences, and background and also capable of maximizing their motivation and 
performance [7]. Specifically, the Operator 4.0 vision is strictly correlated with 
the concept of Human-Cyber-Physical Systems intentionally designed to enable 
the collaboration between humans and machines. A H-CPS is a system engi- 
neered to: (a) improve human abilities to dynamically interact with machines in the 
cyber and physical worlds employing intelligent human-machine interfaces, using 
human-computer interaction techniques designed to fit the operators’ cognitive and 
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physical needs, and (b) improve human physical, sensing and cognitive capabilities, 
using various enriched and enhanced technologies (e.g., wearable devices) [8]. 

The current conception of H-CPS is the result ofa long path marked by the tech- 
nological findings of industrial evolutions. When the manufacturing sector entered 
the era of digitalization in the end of the 20th century, the digital world interposed 
and started to link together human and physical system giving life to the H-CPS. 
The latter considerably enhanced computation, efficiency, control, precision and 
capabilities of production systems allowing to connect the digital, the physical, and 
the human side of these systems [9]. In the last decades, huge progress has been 
made in information technologies leading to the breakthrough of the Digital Twins 
(DTs), adding further features to the H-CPS. 

The National American Space Agency (NASA) introduced the DT as “an inte- 
grated multi-physics, multi-scale, probabilistic simulation of a flying vehicle or sys- 
tems” [10]. From that moment, the concept has been applied in many domains, 
including manufacturing. With this wide adoption, the concept evolved. Nowa- 
days, observing the most recent definitions, and also the scope of this work, the DT 
can be defined as “A digital twin is a digital replica of a physical entity. The Digital twin 
refers to actual or potential status of physical assets, processes, people, systems and devices 
that can be used for various purposes: planning, optimization, what-if analysis, moni- 
toring... The DT is a dynamic virtual representation of a physical object/system across 
its lifecycle, using real-time data to enable understanding, learning and reasoning”. It is 
necessary to emphasize that the DT should not to be confused with a simulation 
model. The DT has to be a high-fidelity virtual replica of a physical entity with 
real-time two-way communication supporting simulation and decision-making for 
product service enhancement [11]. 

Throughout the H-CPS evolution phases, the digital system has experimented 
diverse major enhancements meanwhile its connection with the human has 
remained almost unchanged. Despite the hundred examples of DTs applied to 
manufacturing products, devices, and machines, only few works addressing the 
HDT can be found even though the operator continues to be a key resource with 
relevant impacts on the performance of the manufacturing system. For these rea- 
sons, to bring the H-CPS a step further, humans have to be modelled and included 
in the digital world, together with the existent D'Ts. 


7.3 The Human Digital Twin: A Review 
of Existing Applications 


According to Segan and colleagues, the HDT can include models fed by dynamic 
and real-time data merged with static or quasi-static ones, enabling a comprehensive 


136 The Human-Digital Twin in the Manufacturing Industry 


representation of the human entity [12]. To gather real-time information, activi- 
ties and behaviours performed by humans have to be recorded. Moreover, these 
real-time data have to be compared with historical data, which are stored and 
formalised together with information describing human characteristics and con- 
ditions [13]. The HDT has not only to provide a digital representation of anthro- 
pometric and physiological features but also a representation of a person’s inner 
state [14]. 

In [15] a meta-model is defined to realise a modular and tailored HDT compre- 
hensive of all the entities that need to be modelled to create a HDT. These include 
worker’s characteristics, medical, emotional and psychophysical conditions, psy- 
chophysical and geospatial parameters, contextual, functional and decision mod- 
els. The HDT is a necessary technology to facilitate human worker integration in 
an Industry 4.0 environment to address communication, data aggregation, sim- 
ulation and scheduling [16]. The emerging applications of HDT realized in the 
manufacturing industry in the last 5 years mainly include workers monitoring, 
production planning and scheduling, human-robot collaboration and adaptive 
automation. 


Worker Well-being Monitoring 


Thanks to the miniaturization and reduction of costs, the adoption of wearables 
and sensors, has been growing also in the industrial context to investigate work- 
ers conditions and well-being. Employee’s well-being is a key factor in determin- 
ing the organization’s long-term competitiveness and it is also directly related to 
production efficiency. The cumulative effect of positive impacts on the human 
factor brings economic benefit through productivity increase, scrap reduction and 
decrease of absenteeism. Few research works have been recently developed, where 
workers’ physiological data are used to infer the insurgence of phenomena such 
as fatigue [17, 18] and mental stress [19] which, in turn, have a relevant impact 
on process performance. Another research line adopted eye-trackers, together with 
wearables and cameras to estimate worker’s attention and stress levels, to understand 
assembly sequence and to identify the criticalities in the product design affecting 
the assembly process [20]. 


Production Planning and Allocation 


In labour-intensive production systems, workers allocation is one of the key 
activities as capacity and worker skills are the main factors that affect the pro- 
duction rate [21]. The exclusion of such elements from the problem definition 
may result in poor performance. To this end, the HDT has been applied to 
support production planning and allocation. [22, 23] propose a HDT to store, 
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organize and communicate workers’ skills, preferences, virtualized personality and 
to enable humans to take part to a decentralized computational decision-making 
process which leads to an improved task scheduling. However, in this context 
and application, it is fundamental to consider that, despite the different tax- 
onomies and methods that exist to assess workers skills like ESCO and O*NET, 
humans learn and grow, being able to deal with novel challenges. This is a rel- 
evant challenge to consider in realising a HDT supporting production planning 
and scheduling. 


Human-Robot Collaboration and Adaptive Automation 


In the last decades, adaptive controls systems have been explored, including work- 
ers features within real-time control loops [24-26]. The HDT is crucial for such 
kind of approaches. [27] used a Microsoft Kinect sensor to identify and track a 
worker within a work cell to identify the possible collisions areas with a robot. This 
information has been used to optimize in real-time robot trajectories in order to 
reduce collisions. In [28], a HDT, has been adopted to monitor worker fatigue 
and mental stress by introducing a physiological monitoring system and a smart 
decision-maker to adjust the level of support offered through a collaborative robot 


(cobot). 


Ergonomics Analysis and Layout Design 


Many examples exist where the HDT has been used to analyse the work cell 
ergonomics and to define the best layout considering worker characteristics, anthro- 
pometric ones above all. Many examples exist applying off-line simulation of 
humans [29]. Moreover, a few real-time examples can be found, where worker pos- 
ture and characteristics are computed using motion capture tools [30] and even 
used to feed control systems to perform collaborative and ergonomic tasks [31]. 


Other Context Applications Outside the Manufacturing Industry 


Extending the scope of the use of HDT outside the manufacturing industry, it 
is possible to find interesting applications. A digital representation of construc- 
tion workers has been created collecting automatically physiological parameters 
(e.g., human heart rate, upper body posture angle, traveling speed) to identify 
workload severity [32]. In the same sector, workers’ thoracic posture and spatio- 
temporal data have been used to estimate activity types and assess productivity in 
real-time [33]. In the medical and fitness fields, some applications implement the 
automatic exchange of data relying on wearable devices to compute health condi- 
tions [34] and predict athletes’ performance [35]. 
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7.4 The Technological Framework 
for Human Digital Twins 


‘Taking inspiration from works describing the technologies and architectures behind 
the creation ofa HDT [28, 36] and, more in general from those dedicated to DT in 
a broader sense [37, 38], 3 technological layers, as shown in Figure 7.1, are funda- 
mental to realise a HDT, embracing all the cutting edge technologies end method- 
ologies that enable the digitization of a worker, together with his/her characteristics, 
conditions and behaviours. 

The Sensors Layer represents a connection from the physical to the digital 
world, in charge of the creation of data from the physical production system and 
of the communication in quasi-real time using standard data formats. In the case 
of the human, the installed hardware to realise part of such connection are mainly 
wearable sensors that fetch psychophysical parameters like Hearth Rate (HR), Skin 
Conductance (SC) or Galvanic Skin Response (GSR), while the machinery present 
at the shop floor needs to be fitted with sensors that generate useful data to be shared 
with the upper layers. In the case of a HDT, to connect wearables and sensors with 
the upper layers in many cases a gateway is the best solution to be adopted [39]. 
This is particularly helpful in case of limited battery capacity on wearables side, 
allowing to have these devices smaller, more comfortable and cheaper. A similar 
concept to the gateway is applied to machinery and robots, since most of them use 
industrial standard protocols like OPC-UA, Euromap or “simply” offer a PLC com- 
munication interface. In this case, the gateway acts as a middleware that translates 
the machine protocols into the unique standard communication protocol known 
by the whole HDT. This approach allows to realise a simple plug and play architec- 
ture, requiring only to build specific gateways without changing anything upstream. 
Gateways can be installed on many types of devices, including smartphones, com- 
puters or a simple raspberry. The key factor to consider is that this device needs 
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Data processing, 
monitoring, 
behavioural and 
decision modules 


Sensors Layer 


Figure 7.1. Abstract digital twin architecture. 
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to be able to join a network connection and to have enough capacity to keep a 
constant flow of data to and from the Middleware, which is the next layer that 
resides between the HDT Core and the Sensors Layer. In this regard, the NTN 
5G is a relevant enabler that allows to have low latency and high reliability data 
streams, without the need of wired connections [40]. Low-level computation or 
pre-processing of data may resides in the Sensing Layer, depending on computa- 
tional requirements and applications. It is necessary to consider that, if not prop- 
erly performed, pre-processing may lead to information loss. However, if executed 
correctly, relevant benefits can be obtained in terms of communication efficiency, 
security and scalability. 

The Middleware enables an active connection from and to the HDT Core. 
Since, it has to manage withstand high-frequency data coming from multiple sen- 
sors installed in the Sensors Layer, the Middleware has to be well designed and 
implemented to exploit data flows from physical to digital layer. It has to be capa- 
ble to empower the coherent integration between the models and the physical 
architecture, enabling a seamless usage of data for verification and validation of 
complex behaviours. There are various tools and technologies that support the 
implementation of this layer. The most commonly used solutions are based on 
MQTT [41], a well-known and established data exchange protocol already widely 
used in the IOT and industrial world. One of the alternatives that implement 
MQTT is Apache Kafka [42] which offers many useful functions including short- 
term memory to keep a backlog of the last exchanged messages. However, in some 
use cases, a more simple and light approach is suggested to favour of performance, 
such as the one adopted by Mosquitto [43]. 

The third layer is the core of the HDT, where data persistence, governing 
module and the various data processing, monitoring, behavioural and decision 
modules are located. In the data persistence, it is possible to find all the data stor- 
age functionalities including databases that contain the data-models, describing 
the entities and features, and historical storage for the data coming from the sen- 
sors. For this latter functionality, it is recommended to use a time-series database like 
InfluxDB [44] or QuantumLeap [45], optimized for this kind of data. Finally, there 
are functional modules, which are capable to process, simulate, predict, reason 
and decide. The best approach to include data processing, monitoring, behavioural 
and decision modules is to adopt a series of plug and play services. The key ben- 
efit of such approach, obtained thanks to a proper architecture design, is that it 
can easily be removed, added or extended [46]. An alternative is the development 
of specific plugins. However, this requires the development of dedicated SDK and 
imposes a specific programming language. Meanwhile using services, the developer 
is free to select whichever programming language since the communication inter- 
face is universal. Modules can be of different nature. They can be focused on the 
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post-processing of the data that transit on the Middleware. This can include data 
validation, classification, aggregation, sorting, and cleaning. Monitoring models 
focus on specific features and attributes, monitoring their evolution and elaborating 
raw data to compute more complex information and detecting possible deviations. 
The decision modules identify decisions through the HDT in order to intervene 
in the digital and/or in the physical world. The behavioural modules elaborate the 
current status of the HDT to make predictions and simulate its evolution. 

Artificial Intelligence (AI) plays a relevant and prominent role in the creation 
of such kind of modules. One of the main goals of AI is to build software sys- 
tems, models and algorithm capable to perform complex tasks [47]. The behaviours 
and variables that involve a human are much more complex than those that can 
characterize machines and robots. Furthermore, humans in a system are infinitely 
more unpredictable and have greater degrees of freedom than a machine, which is 
usually stationary, always performs the same tasks, has a set of standard compo- 
nents. For these reasons, AI is of major relevance for creating a HDT, which allows 
to create models without actually knowing the relationship between the inputs and 
the outputs. Without the help of AI, it would be very complex to define heuris- 
tic relationships between, for example, given physiological data such as HR, HRV, 
etc. and physical or mental stress. However, AI alone is not enough. It is necessary 
to consider that AI needs valuable data to build effective models and algorithms. 
Moreover, when humans are modelled with AI, it is also fundamental to consider 
related ethics and explainability issues. 


7.5 The STAR Approach Towards Human Digital Twins 
for Manufacturing Applications 


To realise a step ahead in the sustainability of production systems it is necessary to 
evolve current approaches and technologies mainly oriented to digitalisation aspects 
by including also human factors. Therefore, in the STAR project, a novel approach 
to the digitalisation of humans is proposed. To get the most from workers and 
the production systems, where a multitude of other players act, including robots, 
automation systems, AGVs and machines, it is necessary to include a more sophis- 
ticated and complete representation of workers. Humans and machines have not 
to be considered independent, but as collaborative entities that complement their 
capacities in order to achieve improved manufacturing performance, and their his- 
torical data, status and evolution must be available for analysis and optimization. 
To achieve such a goal, the digital representation of workers proposed by STAR 
allows to include contextual data (e.g., assigned job, current workplace, current 
shift, training program), quasi-static data (e.g., worker needs, skills, height, age), 
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real-time sensor data (e.g., heart rate, heart rate variability, galvanic skin response, 
temperature) and dynamic data (e.g., fatigue level, emotions, position, current 
activity). Thanks to distributed sensing solutions that gather data about the workers 
and the surrounding workplace, it is possible to build a digital replica of a human. 
The STAR’s HDT includes: (i) sensing modules to easily connect sensors and gather 
data from the shop-floor; (ii) a specific component to collect, structure, store and 
use data to realise the digital representation; (iii) a set of AI and non-AI modules 
to elaborate sensors data and compute complex features. 

The STAR’s HDT can be considered as a single source of truth of workers- 
related data. It offers a centralized access point to exploit wider set of workers’ 
related data. STAR creates a digital representation of the workers, seamlessly inte- 
grated with production system DTs, that can be exploited by AI-based modules to 
compute complex features, feeding and enriching the HDT itself, or to make bet- 
ter decisions, dynamically adapting automation systems behaviour targeting both 
production performance and workers’ safety and well-being. The STAR’s HDT 
is organised in 3 main technological layers, according to the architecture intro- 
duced in Section 7.4 and is composed of the following components, as depicted in 
Figure 7.2. 


© Shop-floor entities, agents and gateways (Sensors Layer): sensors, wear- 
ables and PLCs collect and stream data from the shop-floor. To facilitate data 
gathering from the workers and the production system entities, the HDT 
integrates agents and gateways to ensure the data collection, harmonisation 
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Figure 7.2. STAR’s HDT architecture. 
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and accessibility from heterogeneous sources and to create bridges between 
these sources and the upper layers. 

IoT Middleware (Middleware): this layer supports M2M connection and 
it is based on the MQTT lightweight messaging protocol. It allows bi- 
directional communication under a publish-subscribe mechanism and the 
organisation of important amounts of heterogeneous data into multiple 
topics. Each user has a set of channels where data are streamed to and accessed 
by the modules that need them for further computations. 

Data storage and Time Series Data Storage (Human Digital Twin Core — 
Data Persistence): in the data storage all the structure and core information 
about the HDT are stored. In addition, the workers’ quasi-static data are 
persisted in this component. Meanwhile, the Time Series Data Storage acts 
as a backlog of sensors data, in which the various entities of the HDT can 
access in order to make predictions or extract feature for computations. 
Orchestrator and Models (Human Digital Twin Core — Governance 
module): this component is responsible to manage all the entities in the 
HDT. It knows exactly which kind of data each sensor is producing, who 
are the workers online and where their data are published. In addition to 
that, it also knows the modules currently in use, which information they take 
as input and where they publish their outputs. Models are a set of descrip- 
tors defined by the administrator of the HDT that describes any worker or 
contextual feature. 

Worker monitoring modules (Digital Twin Core — Data processing, anal- 
ysis and decision modules): these modules allow to elaborate data from 
workers, contextual sensors or any kind of system that publish data on the 
IoT Middleware. These modules target the detection of human status and 
conditions and compute complex features to allow human and machines 
decision-makers to consider the human factors within their execution and 
control logics. 


The STAR’s HDT has been conceived and designed to be extensible and 
scalable. In the STAR project, the HDT is applied in two different use 
cases: (i) Human-Cobot Collaboration improving robust Quality Inspections; (ii) 


Human Behaviour Prediction and Safety Zones Detection for Routing. For these 


first experiments, sensors and specific human features have been selected (e.g., 


Hearth Rate, accelerations, occupied space) to fulfil the expected objectives (e.g. 


monitor worker fatigue, predict safety zones). However, the STAR’s HDT has been 


conceived to easily integrate new types of sensors and devices and to characterise 


and model several other features. This is made possible by defining a set of shared 


interfaces to have an easily integrable and reliable plug and play environment. In 
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addition to the extensibility, in such kind of solution, it is also essential to consider 
the scalability. Different production systems involve a different number of oper- 
ators, machines and sensors. The HDT must be applicable in different environ- 
ments, collecting data from multiple sources, from different and numerous workers 
and contexts, always ensuring the same reliability. 

Moreover, the STAR’s HDT has been designed to be interoperable. It integrates 
Models using a modular and flexible syntax that is understandable to humans and 
machines alike. These Models describe how data and broadcast messages are struc- 
tured, to allow all the software components to collaborate. This, together with the 
IoT Middleware that supports the exchange between different type of sources and 
destinations, allows to integrate different types of AI modules, and also to receive 
and share data with different types of systems, including simulation engines, PLCs 
and legacy ICT systems. 

The HDT is indeed a cornerstone for human-aware optimization, simulation, 
what-if analysis, and monitoring, that are key strategic activities for manufacturing 
companies, in order to improve efficiency of the configuration and use of produc- 
tion resources. Based on massive, cumulative, real-time, real-world data, the HDT 
represents an evolving profile of the human-centric process in the digital world, 
that provides important insights on system performance and sustainability, leading 
to effective actions in the physical world. 


7.6 Concluding Remarks 


In the present literature the concept of DT is well established and numerous appli- 
cations involving products, machines and equipment exist, nevertheless, only very 
few examples involve human aspects. To cover this gap, this work provides tech- 
nological and architectural insights to realise a HDT, detailing how the presented 
approach is developed and used in the context of the STAR project. Future research 
will focus on the development of libraries to realise the HDT based on the described 
technological architecture, on the improvement of existent Worker Monitoring 
Modules and on the development of a new module, supporting decision-making 
within collaborative work cells. 
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Chapter 8 


Video Analytics for Situation Awareness 
Safe Robot-Human Cohabitation 
in Production Lines 


By Jean-Emmanuel Haugeard and Andreina Chietera 


Nowadays with the Fourth Industrial Revolution (or Industry 4.0), the automation 
of traditional manufacturing and industrial practices required the deployment of 
mobile robots that are involved to accomplish several tasks to assist workers in a 
modular production line. The robots are equipped with several embedded sensors 
(radar, camera) to analyse the nearby environment, in order to move safely and 
avoid obstacles. Despite, after that this technology does not provide to the robots 
a dynamical global view of the scene. Thus, the cohabitation between humans 
and robots can lead to dangerous situations. In order to ensure security between 
robots and workers, security zones must be detected dynamically throughout the 
infrastructure. For that, we will implement algorithms to analyse the scene using 
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the global point of view of the camera network already deployed in the factory. 
Video analytics allows to exploit automatically the video streams in real time with 
the aim to detect anomalies and to raise immediately an alarm. To this end, the 
algorithms detect and track elements of interest (such as people, robot and new 
object occupying the scene) over the time, and alert the robots of the presence 
of any obstacles in the surrounding area. Where a human is detected close to the 
robot, his movements will be monitored. Based on a human behaviour analysis, 
the system will decide whether a new robot ‘path should be calculated to reach 
the docking station or to stop completely to avoid any collision. This chapter 
presents a brief overview of these modern computer vision approaches: to detect 
objects of interest in video streams, and to localize them in the 3D environment. 
The purpose of these video analytics is to feed a “planner” indicating dynami- 
cally which areas should be avoided by a robots’ fleet operating in the production 
lines. 


8.1 Introduction 


One of the main goals of STAR is to ensure the optimization of a production line to 
increase the efficiency of the manufacturing process. We start from the assumption 
that efficiency and safety go hand in hand in the complex environment of a pro- 
duction line, in which operators, robots and automatic systems share dynamically 
the same physical workspace. 

The aim of this task is to take advantage of modern computer vision approaches 
in order to recognize postures and motion of workers and locate them as well as 
the items occupying the environment. The main output will be an “average spatial 
heatmap” representing a probabilistic occupancy of the production lines based on 
fixed RGB cameras deployed in the factory. The purpose of this module is to feed 
a “planner” indicating dynamically which areas should be avoided by the robots’ 
fleet. 

The solution we imagine is conceived by merging the following technologies: 


e Dynamic object detection via Convolutional Neural Network (CNN) 

e Skeleton extraction by human pose detection CNN 

e 3D-localization and motion in the infrastructure and estimation of human- 
robot distances using the geometric calibration of fixed RGB cameras 

e Heterogeneous and homogeneous multi-sensor fusion merging video analyt- 
ics results coming from cameras dispatched in the production lines including 
other localization sensor data. 
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8.2 Detection of Objects of Interest in Video Streams: 
A Short Overview 


The aim of this chapter is to highlight some video analytics approaches based on 
object detection and classification. In our context of monitoring for security pur- 
poses of factories, the STAR system must be able to analyse the scene and monitor 
the robots deployed in the factories. The aim of this video analysis module is to 
include in to the STAR system a software detecting empty area for secure robot dis- 
placements. The system we imagine should be able to detect the obstacles in order 
to avoid collision and to modify the robot planner dynamically. The objects to 
detect are: moving items, static object/obstacle on the navigation path and human 
occupying the robot’s neighbourhood. 


8.2.1 Moving Object Detection Using Background Modeling 


The image segmentation into background regions and moving objects is a crucial 
stage in the video applications. The segmentation result is often used as an input for 
object detection/classification. Background subtraction methods are based on the 
premise that the difference between the background model and the current image 
is due to the presence of moving objects in the scene under observation. 

The proposed approaches are based on background modeling of the observed 
scene (“background”) as a first step, then on the analysis of the differences between 
each image and the estimated background (cf. Figure 8.1). 

The foreground segmentation is possible under certain conditions: 


e The camera is static (properties do not change) 

e The background is statically visible most of the time 

e The background is quasi-stable and can be modelled statistically over time 

e Objects of interest are different (color/texture) from the background model in 
order to detect the difference between the current image and the background 
model. 


A detailed survey of various background modeling methods in video analysis 
applications can be found in [1]. The background subtraction approaches can be 
divided in 4 categories: 


e Basic methods : define the background as the mean or median of the observed 
values. 

e Filtering methods (e.g. Wiener filter [2], Kalman filter [3]): design dynamic 
backgrounds by adaping the model using a filter. 
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Figure 8.1. Basic steps for background subtraction algorithms. 


e Clustering methods (e.g. K-Means [4], Codebooks [5]): compare the current 
pixel and the different clusters at every point in the image. 

e Stochastic methods (e.g. Gaussian model [6], Gaussian mixture model — 
GMM [7], Kernel density estimation — KDE [8]): use probabilistic modeling 
of the background. 


Stochastic methodes (GMM approaches) are more commonly used in the video 
applications. 

The background subtraction allows to extract the “foreground” of the scene, 
namely the silhouettes of new or moving objects (people, vehicles, objects newly 
occupying the camera point of view) in the scene, but also extract areas in which 
lighting changes appeared due to the variations of the lighting conditions during the 
day. Moreover, if the objects are close to each other and/or they hide each other, their 
silhouettes are merged together as a single element and the resulting foreground is 
difficult to analyse by its shape. 

This type of approach therefore makes it possible to detect all the changes in the 
scene, which may correspond to the presence of a new element (objects or people), 
but also to the presence of moving people/robot. 


8.2.2 Object Detection and Classification Using CNN 


Today, as in the field of image classification, object detection approaches are all 
based on Convolutional Neural Network architecture (CNN). These solutions 
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based on CNN architecture consist of two parts: a “feature extractor” called back- 
bone and a “feature classifier”. In the field of object detection based on deep learn- 
ing [9], the architectures usually can be divided into two categories: two-stage and 
single-stage approaches. 


e Two-stage detector 


Two-stage networks use “Region Proposal Network” algorithm as a first step 
to quickly select the best candidate windows. These windows (from a few hundred 
to a few thousand) are then processed by a classification model (the second step) to 
decide whether or not they contain an object from the list considered. The most 
cited examples are the R-CNN model (Regions with CNN features [10]) and its 
derivatives: Fast R-CNN [11]), Faster R-CNN [12] and Mask R-CNN [13]. 


e One-stage detector 


The one-stage detectors propose predicted boxes from input images directly 
without the region proposal step, thus they are time efficient and can be used for 
real-time applications. The one-stage detectors apply the classification directly to 
dense window grids (“anchors”) of different sizes (cf. Figure 8.2). The two main 
representatives of this family are the YOLO model (You Only Look Once [14]) 
and its derivatives: Yolov2, Yolov3 ([15]), Yolo9000, and the SSD model (Single 
Shot Detector [16]). 


Bounding Boxes 
+ 


Confidence 
Me ==, 


SxS grid on input Final detections 


Class probability map 


Figure 8.2. Object Classification using YOLO Algorithm in the context of Robot-Human 
Cohabitation. In this result, the final detection are robot, human and stool. 
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Figure 8.3. Deformable Part Model: model for the person category. 


The performance of these models depends on their own architecture (meta- 
architecture), but also on those of the backbone used. Among the CNNs most 
used in this role, there are two families of state-of-the-art models for classification: 
VGG16 and its derivatives [17] and ResNet models (Deep Residual Learning [18]). 

As a result, the vast majority of these techniques sketches, for each object 
detected, a rectangle called a “bounding box” surrounding the object in the image. 
The main exception is Mask R-CNN, which additionally provides the “mask” as 
the shape of each object detected, consisting in all the pixels belonging to the object 
in the image. 


8.2.5 Human Detection Based on Deep Learning 


Flexible object (e.g. a person’s body) can take multiple appearances in the image. 
This characteristic makes the task of detection/classification more complex. From 
the 2010 s, research laboratories worked on methods based on the shape of objects 
of interest merged with machine learning techniques to be able to take into 
account all the possible configuration of the shape (feature templates — Deformable 
Part Model (DPM) [19] Figure 8.3). These techniques relied on the use of local 
attributes (descriptors) such as Histogram of Oriented Gradients (HOG [20] 
Figure 8.4), and could be a stand-alone solution or could be applied in combination 
with a background subtraction method to decrease false negatives. This learning- 
based approach has seen significant improvements with the advent of Convolu- 
tional Neural Network CNNs, and their adaptation to object detection. The main 
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problem with the techniques proposed in the factory context is the lack of robust- 
ness when partial occlusion occurred. Indeed and especially in a production line, 
the occlusion affects the people detection making the task more complex. 

A technique for people detection, called OpenPose, was recently proposed [21] 
which takes into account both the variability of the shapes observed (due to the fact 
that people are articulated objects) and the presence of partial occlusions. Open- 
Pose is based on a CNN architecture and makes it possible to detect different 
characteristic points of the human body (joints, eyes, mouth, nose, ears, hands, 
feet) and, jointly, to group these points in a graph forming a skeleton representa- 
tion (cf. Figure 8.5). More specifically, the skeleton detection algorithm allows to 
track human poses by detecting and estimating the position of the characteristic 
points defining human postures. The approach creates heat maps for joint extrac- 
tion and extracts affinity fields considering all the detected joints in order to infer 
the link between them and, consequently, allow the detection of human limbs. The 
algorithm can simultaneously process different observation scales. It should also be 
noted that it can detect people both by their silhouette when it is clearly visible and 
by their head, which is more rarely masked. 


Once humans or other items are detected from video footprint, they are located in 
the infrastructure. This absolute positioning of the elements of the scene requires a 
camera calibration phase, in order to associate to each pixel of the image space the 
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Figure 8.5. An example of the results obtained using different detectors: (1) moving 
object detection (GMM subtraction), (2) object detector to identify the stool and the 
robot (Yolo), (3) human detector (OpenPose). 


coordinate in an absolute 3D coordinates system. Once an element is detected, it is 
projected on the ground, taking into account a reference measurement (such as the 
height of the body, the robot dimension). The projection on the ground allows to 
estimate the actual 3D position and then the distances between any other elements 
of the images (cf. Figure 8.5). 


8.3 Conclusion 


We have introduced a short overview of detection of elements in the scene with 
the purpose to define empty spaces for robot navigation. Based on these machine 
learning solutions, we will develop new innovative approaches able to analyse the 
global scene, alert the workers to potential danger and feed the robot path planner. 
In the context of STAR, the robot should be able to detect the obstacles in order 
to avoid collision. These video analytics will provide the input in real-time to feed 
a “planner” with the areas that should be avoided. 
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Human in the Loop of Al Systems 
in Manufacturing 


By Christos Emmanouilidis and Sabine Waschull 


Artificial Intelligence (AI) in manufacturing is typically looked upon from the view- 
point of its contribution to automation. Additionally, the role of AI in augmenting 
human activities has been the subject of a wide range of studies with impact on 
practical applications in manufacturing environments. Recently, the empowering 
effect of human and AI actors working in synergy has attracked increased atten- 
tion. After outlining relevant work, this chapter considers the potential emergent 
outcomes of such a synergy in a way that goes beyond automation or augmenta- 
tion. Aimed at both developers and work designers, the present work proposes a 
model of human-AlI interaction along with an outline of key concepts and success 
criteria towards making human-Al interaction more effective. 
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9.1 Introduction 


Early work on the integration of human and technical actors in sociotechnical sys- 
tems has given rise to the field of Human System Engineering (HSE), defined as 
the application of principles, models, and techniques to system design, taking into 
account human capabilities and limitations [1]. The HSE field has evolved substan- 
tially, leading to significant advances in Human-Systems Integration (HSI). While 
HSI has matured to take into account automation and ergonomics considerations, 
it has been far less concerned with the integration of Artificial Intelligence (Al). 
Yet, the integration challenges associated with introducing AI elevates the need to 
consider the joint optimisation of the human and technical systems capabilities to 
a level that incorporates the potential outcomes of humans and AI agents acting 
in synergy. A sharp contrast between human and non-human actors is that of the 
understanding and intentionality of actions: people should understand the pur- 
pose of actions, whereas technical system target at best to perform “as instructed”. 
This “as instructed” can be broad enough to encompass different instruction sub- 
jects: designers, operators, or even programs. Automation has empowered techni- 
cal actors to create, process, and execute complex “instructions” in highly efficient 
ways. AI raises significantly more the capabilities of automation systems to han- 
dle or respond to siutations that automation alone would not suffice to handle but 
often still lacks the versatilty of human congitive capabilities to deal with uncertain, 
incomplete, or generally less well-defined contexts. As a result, human involvement 
on such tasks is valuable. However, human actions are typically more effective when 
human operators are acting within a shared understanding of the work activities 
context and this “situational awareness” is recognised in a “collective activity” view 
of work environments [2]. According to such a view, “collective activity” should 
not be viewed upon as the sum of individual activities, but through the evolv- 
ing interaction of the actors which contribute to it. Therefore, when considering 
the interaction between human and non-human actors (including AI) in manu- 
facturing environments, it is not sufficient to consider how human actions can be 
augmented by technical actors or how technical systems can be aided by humans. 
Instead, it is more relevant to analyse and understand the emergent outcomes of 
their interaction. While the interaction between human actors and automation 
agents has been the subject of numerous research studies, previous works mostly 
consider how AI supports humans [3, 4]. Beyond this, AI has recently been consid- 
ered through the situational awareness perspective, as a means of collective activity 
effectiveness, mostly through seeking explainability and transparency in its out- 
comes in the form of eXplainable AI (xAI) [5]. As the introduction of Industry 
4.0 technologies has reshaped work roles in profound ways [6], the collective 
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activity viewpoint is appropriate for the understanding, analysis and design of the 
human-Al interaction in manufacturing. The need to consider human-centricity 
at the design stage of sociotechnical systems should be extended to the design of 
human-centred AI when considering manufacturing workplaces [7]. Starting from 
the empowering effect that the interaction between technical and human actors 
can have [8], a collective activity perspective can place this empowerment in the 
context of action affordances. The term “affordance” denotes “action possibili- 
ties provided to the actor by the environment” [9]. But such possibilities have a 
relational nature, i.e. they refer to interaction possibilities relevant to a specific 
actor operating in a specific environment. This renders the augmentation view- 
point too simplistic. Incorporating human activities as steps in a broader Al-driven 
process, termed as “human in the AI loop”, implies more than augmenting the 
algorithm [10]. Although the physical and cognitive support capabilities offered by 
Industry 4.0 technologies to human workers are acknowledged [1 1], the actual inte- 
gration of human cognitive capabilities in the AI loop is less well understood [12]. 
Furthermore, in most cases the AI outcomes are characterised by a lack of trans- 
parency and explanation and so, being poorly understood, they are not sufficiently 
trusted by humans. Therefore a shared context through situational awareness in 
human-Al interaction is hardly established, limiting the effectiveness of human-AI 
integration. 

The aim of this chapter is to make a contribution towards more effective 
human-Al integration in manufacturing environments by looking at the poten- 
tial contribution of the Human in the AI loop and then seeking to translate this 
to recommendations for the effective integration of humans and AI from a work 
design perspective. This takes the form of a conceptual model of human-Al inter- 
action, which was put to test in co-creation workshops where different stakeholders 
contributed to the design of human-centric solutions for manufacturing lines. The 
co-creating stakeholders worked through different scenarios to produce a synthe- 
sis of interaction possibilities into new designs, while stating also expectations for 
the desired outcomes of the new approaches. The proposed model and co-creation 
design practice can be of practical value to system and work designers alike, when 
seeking to integrate humans and AI in human-centric deployments in manufac- 
turing environments. The rest of the chapter is structured as follows. Section 9.2 
outlines how human and AI actors interact in different AI processes and highlight 
the potential benefits. Section 9.3 introduces a conceptual model of interacting 
human and non-human (AJ) actors in production environments. Section 9.4 con- 
siders work design implications of such collective activity. Section 9.5 presents the 
outcomes of placing both the conceptual model and the work design implications 
to test in a stakeholders co-creation workshop setting. Section 9.6 is the conclusion. 
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9.2 Humans and Al in Sociotechnical Production 
Systems 


There is barely a single definition of what constitutes AI, but to the extent that 
intelligence characteristics are associated with thought processes and behaviours, 
the expectations for an AI agent would be to exhibit at least some of those 
characteristics. The thought processes are typically looked upon from the cog- 
nitive systems and logic viewpoints, while the behavioural ones may result from 
applying concepts, methods, and practice related to machine learning, knowledge 
representation and reasoning, natural language processing, and agent-based. sys- 
tems [13]. Despite the potential of AI to take on human tasks, e.g. the automa- 
tion of physical, cognitive, discretionary and decision-making tasks [4], there is a 
growing consensus among researchers and practitioners to design human-centric 
technologies which integrate rather than eliminate humans and their capabilities 
[12, 14, 15]. 

While the majority of such human-in-the-loop scenarios consider how AI aug- 
ments humans [3], the opposite (i.e. humans aiding AI) also holds significant 
potential for the successful joint integration of humans and AI agents in produc- 
tion environments [10, 12]. The advances made in the practical application of AI, 
involving scenarios of automation and augmentation of human work [4], create the 
need to better understand the interactions between human and technical actors in 
Al-based production environments. Human augmentation has received extensive 
attention in manufacturing, both from the viewpoint of technology enablers for 
such augmentation, as well as regarding the functional and domain-specific out- 
comes of the augmentation. Enabling technologies for human augmentation in 
manufacturing include web-services for ubiquitous computing [16], multimodal 
interfaces [17], augmented [18] and virtual [19] reality, context-adaptive com- 
puting [20], exoskeletons for physical augmentation [21], and natural interfaces, 
including speech [22] and brain — computer interfaces [23]. 

There are multiple types of activities wherein human actors can aid AI, how- 
ever reported implementations in manufacturing environments have been far fewer 
compared to human augmentation [12]. Yet, the potential contribution of humans 
towards AI agents [10] can be very influential and valuable, even when applied to 
the most data-driven part of an AI process, that of machine learning [12]. 


9.3 Meta-Human Learning in Sociotechnical Systems 


Consistent with the concept of collective activities, this section introduces a model 
of collective or meta-human learning, where the interest is not confined to what a 
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specific human actor or a specific Al-driven one can learn. Instead the interest is in 
the emergent learning capabilities of the broader sociotechnical system. 

A simplistic view is of human and AI actors’ interaction is that human and non- 
human actor capabilities are static and so do their affordances in given technical 
environments. The superiority of human cognitive capabilities over AI in perform- 
ing cross-domain activities is not a controversial statement and likewise the superi- 
ority of Al in data-intensive tasks has been demonstrated, leading to the definition 
of a range of cognitive systems architectures [24]. Efforts to bridge the deficiency 
of AI to perform only within narrow contexts have been mostly in transfer learn- 
ing [25], aiming to transfer the learned capabilities from the original domain of 
the learning to a new one. There have been various examples of integrating human 
knowledge to machine learning [12, 26] but this is only one aspect of the integra- 
tion. There is also a motivation to examine how to enable human and non-human 
actors benefitting from each others capabilities and the empowering effect that 
they can have on each other [8]. To this end, a conceptual model for the collective 
contribution of such actors, acts as a meta-human learning system [27]. The term 
meta-human learning refers to the emergent “learning” capabilities of the over- 
all sociotechnical system. For example, an Al-driven system that is embedded in 
process monitoring can be effective in a range of tasks, but may not have been 
trained to perform well on others. Rather than attempting to offer loosely founded 
and potentially erroneous outcomes, the Al-driven monitoring system may flag 
out cases not seen before. An expert user may be prompted to assign these cases to 
existing concepts or states, thus expanding the knoweldge domain of the original 
AI model [28]. Such an interaction process has become popular and is recognised 
as a form of Active Learning [29-31]. The connection of cases to concepts may 
actually be the product of collective user interaction and annotation of cases, a pro- 
cess which can be seen as a joint Linked Data and Knowledge Management, which 
can be instantiated on knowledge graph constructs [32]. Starting from such con- 
cepts about humans and AI interaction proposed in [8], and incorporating ideas 
about introducing the human cognitive capabilities in the AI loop [12], the model 
proposed in this work is illustrated in Figure 9.1. 

Human actors, capabilities and interaction affordances within the sociotechni- 
cal manufacturing environment are marked in green. Technical actor capabilities, 
including both existing operational and information technology, as well as those 
of AI actors are marked with blue. All actors exhibit certain capabilities which 
can be expressed in a range of interaction affordances given the context of the 
operating sociotechnical environment. However, technical actors empower humans 
to expand their capabilities, inform them about relevant process status or knowl- 
edge, train them on certain tasks, explain (through AI) automated outcomes or 
recommendations, but also constraint their affordances within a range of admissible 
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Empower, inform, constrain, train, explain 


Impact on Sociotechnical Systems 


Enhance, inform, control, train 


Figure 9.1. Meta-human learning through collective activity of actors. 


actions. These are concepts that will be further explored in the industrial use cases 


of the STAR project. 


9.4 Meta-Human Learning Considerations 
for Work Design 


Earlier sections argued in support of Human and AI actors engaging in a collective 
activity, but such engagement may also give rise to different cognitive and mental 
demands on humans, as well as a change in the way physical activities are per- 
formed. How exactly these are likely to affect the overall performance of the opera- 
tions system remains an open question [33]. It is important to design the interaction 
in such a way that the resulting work characteristics lead to positive outcomes for 
the workers and the organization at large. 

The earlier discussion raises questions about how to effectively design such an 
integration to deliver improved performance [34]. To answer this question, it is nec- 
essary to take a work design viewpoint. The physical, cognitive and mental demands 
for workers may affect the overall performance of the operations system [33]. Var- 
ious streams of work design theory came together in the seminal work of [35]. 
An overview is given in [36], including integrative perspectives that provide links 
between the earlier streams. Work design theory provides a set of work characteris- 
tics that should be considered when (re)designing jobs in response to technological 
and social changes to achieve different individual and organizational purposes. As 
such, it is important that the adoption of AI in industry pays due attention to 
these characteristics, so as to deliver a design approach for AI and humans inte- 
gration and produce positive outcomes for the individuals and the organization at 
large. The focus is on the work characteristics that arise from specific task envi- 
ronment characteristics and social environment ones, and exclude those related to 
the broader physical and organizational environment (contextual characteristics). 
The terminology is taken from [37]. 
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An influential task characteristic is autonomy. Autonomy refers to the amount 
of freedom that a human has during the work in terms of timing of the work, choice 
of methods, and the ability to make decisions. Jobs that lack autonomy are consid- 
ered poorly designed. AI may impact autonomy in positive and negative ways [7]. 
Task variety considers the range of tasks that humans need to perform in their job, 
while skill variety relates to the required skills to perform the job. AI may replace 
routine cognitive tasks, but also create new tasks, requiring new skills from humans 
who are interacting with the system. The task and skills variety should match the 
abilities and needs of the individual worker. The same holds for job complexity: 
too little and the job is without challenges; too much creates fatigue and stress. 
AI may impact job complexity by altering the cognitive demands. Feedback from 
the job, i.e. being able to evaluate the quality of work while it is being performed, 
is another task characteristic. In general, AI may improve the feedback from the 
job due to sensor technology and visual devices that can provide such feedback. 
AI may contribute substantially by providing more intelligence to the feedback. 
Conversely, a poor division of task between the AI agent and the human agent may 
also lead to decreased opportunities for learning and impaired situational aware- 
ness. Specialization refers to the extent to which a job involves the performance 
of tasks requiring specific knowledge and skill, and AI may empower humans to 
take on a variety of tasks by supplementing knowledge and enhancing capabilities, 
but it may also shift human work to focus on a narrow set of specialized tasks. 
Problem solving in the job is again a task characteristic which should be chal- 
lenging, but not too challenging for the individual employee. AI can execute rou- 
tine problems, and create new complex problems for humans to focus on. Finally, 
information processing is a task characteristic which should match the cogni- 
tive capabilities of the worker, and which is highly influenced by digitization in 
general. 

There are also some relevant work characteristics related to the social environ- 
ment that may be impacted by the adoption of AI. Traditionally, these charac- 
teristics reflect the relations among workers. However, in modern manufacturing 
environments where Industry 4.0 technologies such as AI are applied in produc- 
tion systems, these concepts may also relate to interactions between humans and 
AI agents. Interdependence traditionally reflects to the extent that humans con- 
nect to each other, but may be expanded to also reflect the connection between 
humans and AI actors. Integrating the human in the Al-loop implies an increased 
dependency between both actors. Similarly, AI may facilitate social support by 
providing valuable connections between team members and enhancing their com- 
munication. Similar effects may be expected for the enhancement of the amount 
of feedback from other humans (peers or supervisors). To summarize, digitization, 
and specifically AI, can have both positive and negative impacts on many work 
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characteristics. Therefore, its impact should be carefully considered in a human- 
centric AI design. 


9.5 Co-Creation Workshops 


The achievement of success criteria such as human-centricity, safety and trust has 
been defined as key success criteria for the development of AI technologies in the 
STAR project. To drive their development, and measure the successful achievement 
of aspired success criteria, a series of co-creation workshops were organised. In par- 
ticipatory design processes users play an active role in all phases of the project by 
proposing ideas and providing suggestions. As opposed to solely assessing the soft- 
ware artifact, users actively and directly participate in the design process activities 
through shared experimentation, mutual learning and reflection [38, 39]. Within 
STAR, each pilot demonstration site will conduct a number of co-design sessions 
throughout the design, development and testing process of the AI technology, facil- 
itating participation to a feasible extent. By means of multiple methods and tools, 
input and feedback will be collected and synthesized for relevant stakeholders. The 
project co-ordinator’s agile development process will ensure that resulting requests 
for changes of the software are incorporated at the interim and final release. To 
increase the studies validity, each session will include multiple stakeholders to assess 
and validate the technical artifacts throughout their development cycle. Workshops 
have been planned for both the definition and design, as well as the two devel- 
opment and testing phases at each pilot site. The development and testing phase 
workshops will contain an evaluation part (focus groups) and a co-creation part. 
An overview of the workshops and their goals can be seen in Figure 9.2. 


Co-create & validate M6 


e Functional Definition and Design 


Workshop 1 requirements 
+ User stories 


* Collaboration 
scenarios 


Co-create & evaluate M21 


+ Functionalities 
Workshop 2 + Usabilty targets 


Early Development gay 
y j + Success criteria 


and Testing 


Co-create & evaluate M26 


* Functionalities 
+ Usabilty targets 

* Success criteria 

+ User testing scenarios 

+ Wider usage scenarios a 


Workshop 3 


Final Development and Testing 


Figure 9.2. STAR co-creation workshops. 
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Definition and Design Phase Workshops (W1) 


W1 workshops have been planned for the definition and design phase of the Al 
technology and the pilot user scenarios. In the first part of the workshop, partici- 
pants are asked to validate and evaluate the functional and non-functional require- 
ments and user stories in a focus group, and then in the second part they co-create 
different collaboration scenarios based on a pre-defined (but open to additions or 
modifications) relevant success criteria. Scenarios address approaches regarding how 
(1) humans can help/augment the AI, (2) where AI technology can help/augment 
humans, (3) where AI substitutes humans, and (4) where AI and humans are inte- 
gral part of a process, with no clear unidirectional support for each other. 


Early Development and Testing Phase Workshops (W2) 


During W2, co-creation and evaluation activities will be undertaken addressing the 
functionalities of the first version of the pilot systems available (Early Design and 
Development). Participants will visualize, simulate and experiment with the pilot 
system, supported by mock-ups and prototypes. Based on that, participants can 
develop and propose improvement ideas and system testing scenarios. Moreover, 
the usability targets and success criteria will be evaluated. The workshop is planned 
after the first iteration of STAR components and systems have been deployed at 
pilot sites (Interim version prototype implementation of pilot systems). 


Final Development and Testing Phase Workshops (W3) 


The content and focus of Workshops W3 are similar to W2, but the co-creation 
and evaluation activities will be focused on the final version of the pilot systems. 
In addition to the aims of W2, the final workshops will additionally propose final 
user testing scenarios, as well as usage scenarios for wider stakeholders external to 
the project. Workshops W3 will be planned to provide sufficient impetus and time 
for acting upon their outcomes, a few months before the delivery final prototype 
implementation of the pilot systems. 

An outline of the co-creation workshop targets can be seen in Table 9.1. Specifi- 
cally for the W1 workshops, the initial pool of success criteria were defined by taking 
into account both pilot targets and work design characteristics, as well as a survey 
among STAR project partners, as a representative expression of a wider stakeholders 
view involving industrial end users, technology providers, legal and ethics experts, 
and research organisations. These success criteria can be seen in Figure 9.3. Each 
workshop was “seeded” with initial user stories, potential components and desired 
functionalities to satisfy the user stories, as well as initial list of tasks relevant to 
the role of humans and AI, and were accompanied by a mapping of the process 
workflow under consideration in each pilot use case. Co-creation workshops have 
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Table 9.1. Co-creation workshop targets. 


Co-creation Workshops Workshop Target 
Definition and Design © Validate requirements 
Workshops (W1) e Validate user stories 
e Co-create task scenarios 
Early Development and e Co-create testing scenarios 
Testing Workshops (W2) e Obtain early feedback from stakeholders (testing 
and validation) 
Final Development and e Final testing and validation scenarios 
Testing Workshops (W3) e Assess future prospects and suggest areas for fur- 


ther development within and beyond the project 


Application / domain specific 


QO Performance Relevant to functional 
requirements 


O Reliability 

O Safety 
Cross-domain but with v Human Safety 
application — specific ¥ Technical Asset Safety 


variations v Environmental Safety 


Relevant to non ő 
functional requirements Q Security 


O Usability 
Q Job Enrichment 


Other 


Relevant 
Criteria? 


O Physical Support 
QO Cognitive Support 
Q Social Support 


Figure 9.3. Pilots co-creation workshops initial pool of success criteria. 


already been conducted for two of the three pilot cases regarding the definition and 
design stage. However, as this is an ongoing activity, the present work only includes 
preliminary information, as the full synthesis of the outcomes has been planned 
for the next period. Due to the COVID-19 pandemic restrictions, the co-creation 
workshops were held using the MIRO’! collaboration tool. For each workshop, ini- 
tial collaboration boards were set up, breaking down the collaboration activities to 
the following stages. 


User stories definition and adaptation 
Functional requirements and components definition 
Linking user stories with functional requirements and components 


BON 


Classifying Human — AI activities into sub-categories: (i) Al augments 
humans (ii) humans augmenting AI (iii) AI substitutes humans (iv) human- 
Al integrated tasks 


1. https://miro.com/. 
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i FR #25. 


Figure 9.4. Co-creation collaboration board activity. 


5. Drafting work design/human effects, as well as operational effects and success 
criteria (following task characteristics mentioned in section 4 and as seen in 
Figure 9.3) 


An example of a MIRO collaboration activity can be seen in Figure 9.4. Dur- 
ing the co-creation workshops the collaborating partners were able to produce an 
enhanced and expanded canvas of user stories, components, and design elements 
regarding human-AlI synergies and expected outcomes. However, the full analysis 
of the co-creation workshop outcomes will be part of a forthcoming report and 
publication. 


9.6 Conclusion 


This chapter considered the interaction between human and non-human actors in 
industrial environments. It analysed the nature and benefits arising from this inter- 
action and highlighted that Al-enabled manufacturing environments do not just 
benefit from better performing humans or machines, but also from their expanded 
capabilities. To unleash the interaction benefits, design approaches for the effective 
integration of human and AI actors in manufacturing are needed. This necessitates 
interdisciplinary synergies involving manufacturing operations, AI, as well as work 
design for Industry 4.0. To this end, the paper presented a synthesis of key syner- 
gies between human and Al-actors, it proposed a model of emergent meta-human 
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learning in sociotechnical systems and provided directions for a work design view- 
point for such an effective integration. As part of the STAR collaborative research 
project, which integrates manufacturing industries, technology providers, research 
organisations, and legal/ethics stakeholders, current research analyses industrial case 
studies involving industrial assembly and quality inspection, agile production, and 
human cobot collaboration in Industry 4.0 environments. Furthermore, it exam- 
ines industrial requirements and success criteria, including overall operational per- 
formance, technical system components performance, and human and job effects 
of AI. The studied human and job effects are an elaboration of factors considered in 
Section 9.4, while aspects of human — AI interactions, discussed in Sections 9.2 and 
9.3, as also assessed. The reported work has placed the initial findings as “seeds” for 
a series of co-creation workshops which have been designed to take place at both the 
design, as well as the implementation and testing faces of the STAR system and its 
components. Among the co-creation seeds were an initial pool of success criteria for 
the overall sociotechnical system, such as reliability, performance, safety (human, 
technical, environmental), security, usability, worker support (physical, cognitive, 
social), and job enrichment, to determine an effective design approach for the inte- 
gration of humans and Al in production environments. Further work will progress 
with the full analysis and synthesis from the design phase co-creation workshops. 
The outcomes of this analysis are seen as valuable input for the STAR project design 
approach for trusted and safe human-centric AI in manufacturing environments. 
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Chapter 10 


A Review of Industrial Standards 
for Al in Manufacturing 


By Eva Coscia, Rubén Alonso and John Soldatos 


This chapter provides a review of industrial standards relating to Artificial Intelli- 
gence (AI) in manufacturing, including: (i) recommendations for human centric 
manufacturing systems; and (ii) technical standards for safety, security and data 
management. 


10.1 Introduction 


This chapter provides a review of industrial standards relating to Artificial Intel- 
ligence (Al) in manufacturing, including recommendations for human centric 
manufacturing systems and safety, security and data management related techni- 
cal standards. The objective of this chapter is to highlight some of the most rel- 
evant standards, while detailing the possible considerations when designing and 
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developing AI applications for the manufacturing industry such as those developed 
in the STAR' project. 

The number of published standards, recommendations and reference architec- 
tures that can influence the development of AI solutions for the manufacturing 
sector is considerable, and as technology advances, new standards and recommen- 
dations are being worked on; therefore in this chapter we aspire only to provide 
some indications, either through links to standardization groups, or by providing a 
bit more details on a selection of the standards that we have found relevant for the 
development of AI systems in the manufacturing industry. 

The chapter is divided as follows: first we present a review of the literature, citing 
various efforts to compile a list of standards. Then, we present several remarkable 
standards and of reference architectures. Finally, we present our conclusions and 
the need for further monitoring of the progress of various standards. 


10.2 State of the Art Analysis 


There have been several efforts in the literature to conduct an analysis of standards 
for the manufacturing sector. In general, due to the wide scope and the extensive 
collection of standards available, these analyses, like ours, focus on a specific part 
of the spectrum. Some of the most relevant analyses are detailed below. 

Choi et al. [1], presented an analysis based on Factory Design and Improvement 
(FDI) process and the ISA-88 hierarchical model of manufacturing operations. In 
their document they focus on PPR (Product, Process, Resource) standards, catego- 
rizing standards related to Product data (e.g. DXF (Drawing Interchange Format), 
IGES (Initial Graphics Exchange Specification), VRML (Virtual Reality Modelling 
Language)), Process data (e.g OAGIS (Open Applications Group Integration Spec- 
ification), ANSI/ISA-95) and Resource data (e.g. B2MML (Business to Manu- 
facturing Markup Language), AP242) and their coverage on the FDI functional 
matrix. 

Li et al. [2] reviewed several smart manufacturing standards and analyzed several 
industrial architectures. In particular, they focused on the standards developed by 
the following standard development organizations (SDOs): 


e [SO/TC184 automation systems and integration, which develops standards 
related to information systems, control devices or data integration and inter- 


operability. 


1. STAR = Safe and Trusted Human Centric Artificial Intelligence in Future Manufacturing Lines (https: 
//star-ai.eu/). 
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e IEC/TC65  industrial-process measurement, control and automation., 
focused on activities that impact the integration of components, as well as 
different aspects of such systems, such as safety and security. 

e ISO/IEC/JTC1 information technology, focused on ICT standards in differ- 
ent scopes, including security, multimedia or smart cards among others. 


The National Institute of Standards and Technology (NIST) published in 2016 
a landscape of standards focused especially on Smart Manufacturing Systems, 
which among other things details standards related to the different phases of 
product development, from design to end-of-life and recycling. This landscape 
covers modeling, data exchange, production system engineering and operation 
and maintenance standards, and identifies 8 priority areas in which standard- 
ization should advance: (i) Smart Manufacturing System reference model and 
reference architecture; (ii) Internet of Things reference architecture for manu- 
facturing; (iii) Manufacturing service models; (iv) Machine to machine com- 
munication; (v) PLM (Product Lifecycle Management)/MES (Manufacturing 
Execution System)/ERP (Enterprise Resource Planning)/SCM (Supply Chain 
Management)/CRM (Customer Relationship Management) integration; (vi) 
Cloud manufacturing; (vii) Manufacturing sustainability; and (viii) Manufacturing 
cybersecurity. 

W. Ziegler in [4] analyses the standardization landscape in the AI field, by focus- 
ing on the standards developed by five international and European SDOs (IEEE, 
ISO/IEC, ITU-T, ETSI, CEN-CENELEC) and two standards setting organiza- 
tions (SSO): W3C and IRTF (Internet Research Task Force). 

One of his main points is that even if the SDOs and SSOs are doing actions in 
the Al field, the standardisation activities are limited, and their number is low and 
does not increase at the same rate as developments and applications. 

A new European player in standardisation activities and pursuing to increase 
the list of available standards on AI is the OASIS Open Europe Foundation 
(OOEF).* OOEF is the European sovereign affiliate organisation to the interna- 
tional nonprofit, OASIS Open, works to advance and support Europe's role in 
open source and open standards development. OOEF activities include: partici- 
pation in collaborative projects supported by the EU and EU Member States, orga- 
nization and participation in events for promoting the adoption of open source 
projects, engagement in European-specific activities to progress open source and 
open standards. The list of standards relevant for the Manufacturing domain and 
the AI technology covers Communication/messaging protocols (AMQP (Advanced 


2. https://www.oasis-open.eu/ 
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Message Queue Protocol) MQTT“), Cloud service management (TOSCA 
(Topology and Orchestration Specification for Cloud Applications)’), privacy 
management (PMRM)°), security and production planning (PPS’), among other 
topics. 


10.3 Standards Overview 


As STAR is a project related to Al, safe and secure systems in the manufacturing 
domain, we have focused on reviewing standards from 3 major groups: (i) Techni- 
cal, Management and Security Standards; (ii) Safety and Health Standards; and (iii) 
Other Relevant Standards. The list and categories were suggested by project part- 
ners (i.e.: technology providers, middleware developers or end-user, both SMEs and 
bigger firms), since these standards have implications in the activities they perform. 
The following sections discuss some of these standards and their impact on both 
the industry and on the suppliers of AI solutions for the industry. 


10.3.1 Technical, Management and Security Standards 


It is essential to begin by highlighting several of the families of ISO and IEC stan- 
dards related to security, like the well-known ISO 27001, 27002 and 27701. 

ISO/IEC 27001 [10] (Information technology — Security techniques — Informa- 
tion security management systems) specifies the requirements for implementing an 
information security management system (ISMS), and enables the assessment and 
treatment of security risks, and the implementation and management of security 
controls. This standard covers 3 main points, impacting the IT systems in the man- 
ufacturing industry: (i) the understanding and monitoring of security risks (includ- 
ing the knowledge of potential vulnerabilities, threats and the impact of them); (ii) 
design and implementation of security controls to reduce the risks; and (iii) the 
implementation of the process for the continuous management of the information 
security requirements. 

ISO/IEC 27002 [11] (Information technology — Security techniques — Code of 
practice for information security controls), is applicable to organisations (public or 


3. https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=amqp 
4. _ https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=mqtt 
5.  https://www.oasis-open.org/committees/tosca/ 


6. _ http://docs.oasis-open.org/pmrm/PMRM/v1.0/cs02/PMRM-v1.0-cs02.html 


7. https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=pps 
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private) of any type and size, including SMEs, and provide guidelines for imple- 
mentation and management of security controls, allowing organisations to select 
controls for implementing [SO 27001 based ISMSs. The main objective behind 
this standard is to facilitate the selection of controls among well-known security 
controls and assists in the creation of guidelines for the management of the orga- 
nization’s ISMS. The standard covers issues related to data access, cryptography, 
asset management and information exchange, all of which are of importance when 
designing safe and secure systems for manufacturing. 

ISO/IEC 27701 [5] (Security techniques — Extension to ISO/IEC 27001 
and ISO/IEC 27002 for privacy information management — Requirements and 
guidelines) addresses the privacy and data protection perspective, and targets to 
enhance the Information Security Management System with additional privacy 
information management requirements. The standard allows Personally Identifi- 
able Information Controllers and Personally Identifiable Information Processors 
to manage privacy controls and facilitate the organisations to implement efficient 
Privacy Information Management Systems, including policies and procedures for 
personal information management, including those needed to align the policies to 
privacy and Data protection regulations. 

In addition to these privacy and security related standards, it is interesting to 
acknowledge the efforts of other technical committees that are publishing stan- 
dards related to IA, security and secure data exchanges between devices and factory 
systems. 

W3C XML security standards” are a family of neutral, open, vendor indepen- 
dent, and freely available standards, specifying security extension for the usage and 
interchange of XML data, which is one of the most used document encoding and 
data exchange formats. Among them, we can name: XML signature, for integrity, 
signer and message authentication; XML encryption for specifying the process of 
encrypting data and representing the resulting data in XML format; and XKMS for 
defining protocols for registering and distributing public keys, for use, for example, 
with XML signature and encryption. 

One markup language used in the industry is AutomationML, this standard 
is covered in the IEC 62714 [12] (Engineering data exchange format for use 
in industrial automation systems engineering — Automation Markup Language). 
The purpose of this standard is to specify a data exchange format tailored to the 
needs of production system engineering and to enable the exchange of informa- 
tion, among heterogeneous engineering tools and, along the entire life cycle of 
production systems. AutomationML combines and adapts industry standards to 


8. https://www.w3.org/standards/xml/ 
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reduce the data interchange and integration issues. The standards currently inte- 
grated in AutomationML are: CAEX for object topologies, hierarchies, properties, 
COLLADA for geometries and kinematics, PLCopen XML for discrete behavior of 
objects. 

ETSI Cybersecurity and AI Standards’ offer market-driven cyber security stan- 
dardization solutions, recommendations and guidelines, and the improvement of 
the security of Al. These groups have a twofold objective: Understand and reduce 
cross-domain cybersecurity implications, and ensure that the artificial intelligence 
is secure. Network security, security of sensors and IoT devices, cybersecurity tools 
or machine-to-machine security are some of the topics covered by the ETSI CYBER 
standards. ETSI SAI is focusing on 3 main topics, all of them impacting the 
manufacturing domain: (i) Securing and protecting AI components from attacks; 
(ii) Mitigation against malicious and dangerous AJ; and (iii) Using AI to improve 
security measures. Notable standards from these committees are: ETSI GR SAI 
004: Securing Artificial Intelligence (SAI); Problem Statement, ETSI GR SAI 005: 
Securing Artificial Intelligence (SAI); Mitigation Strategy Report, or ETSI TR 103 
787-1 CYBER; Cybersecurity for SMEs; Part 1: Cybersecurity Standardization 
Essentials. 

ISO/IEC 18033 [13] (Information technology — Security techniques — Encryp- 
tion algorithms) is a family of standards providing definitions, recommendations 
and guidelines for data encryption and confidentiality. This family of standard 
includes, so far, 5 different types of cyphers and encryption methods: (i) asym- 
metric ciphers; (ii) block ciphers; (iii) stream ciphers; (iv) Identity-based ciphers; 
(v) Homomorphic encryption. Both block ciphers and stream ciphers are in every- 
day use for transmitting information in industry and other sectors. Public key 
mechanisms are also being used for integrity and authenticity (e.g. certificates). 
Identity-based and Homomorphic systems are less widely used currently but are 
of interest for simplifying asymmetric encryption or for performing operations on 
encrypted stored data. 

ISO/IEC 29100 [14] (Information technology — Security techniques — Privacy 
framework) establishes a framework for the protection of PII (biometric identifiers, 
names and surnames, location information, ...) and establishes recommendations 
on how PII should be identified, how data should be controlled and how this data 
should be transmitted. This framework is applicable to various industries, includ- 
ing the manufacturing sector, and allows the organization to control security risks, 
comply with legal requirements, and reduce potential privacy breaches, which in 
turn can impact the organization's image. 


9. https://www.etsi.org/committee/cyber and https://www.etsi.org/committee/sai 
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ISO/IEC 27040 [15] (Information technology — Security techniques — Storage 
security) defines storage security terminology, details some scenarios related to the 
secure storage of data, and provides guidance on the security aspects associated 
with storage and storage technologies. Data storage and warehousing is a key issue 
in manufacturing information systems, and especially in the use of Al in the manu- 
facturing industry. Al models require information to be trained and information to 
be used. This information is stored for decision making or for visualization. How, 
where and with what protection to store this data requires knowing the risks and 
implementing a secure storage approach. 


10.3.2 Safety and Health Standards 


Van Acker [6] in his dissertation on mental workload monitoring in the manufac- 
turing industry, details several studies that show the relationship between fatigue 
and frustration and safety risks, loss of quality or performance, as well as detailing 
how changes in health (physical and mental) effects on workers lead to changes in 
burnout and job satisfaction. Ergonomics, the correct design of tasks and work- 
places has a direct impact on the safety and security of workers. The incorporation 
of standards related to ergonomics and improving the quality of the worker’s envi- 
ronment also impacts on the safety of innovative manufacturing environments. 
Among these standards, we can mention ISO 10075 [16] (Ergonomic principles 
related to mental workload). This standard defines terms related to mental work- 
load, stress and strain, their consequences and the relationship between them. It also 
suggests methods to measure and assess the mental workload and provide require- 
ments for measurement instruments, and provides guidance about the design of 
the workplace, equipment and activities. 

Several other standards exist related to ergonomics and the design of computer 
and industrial systems. For example, ISO 9241 [17] (Ergonomics of Human System 
Interaction) , is a collection of standards that includes documentation and sugges- 
tions related to workplace ergonomics, visual displays, haptic and tactile interfaces, 
or general software ergonomics. Workspace design is also covered in ISO 6385 
[18] (Ergonomics principles in the design of work systems), which, among vari- 
ous work environments, describes and provides guidelines for the design of work 
systems in production spaces, such as assembly line work. Human-centered design 
is also included in ISO/TR 16982 [19] (Ergonomics of human-system interac- 
tion — Usability methods supporting human-centred design). In the latter, human- 
centered usability methods are described and the advantages and disadvantages 
of these methods are presented. Lastly, ISO 26800 [20] (Ergonomics — General 
approach, principles and concepts) presents ideas applicable to the design of tasks 
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or products, along with guidelines for tasks, products or even work areas to be effi- 
cient and safe. 

Among those related to safe systems, we would like to mention two: (i) [SO 
12100 [21] (Safety of machinery — General principles for design — Risk assessment 
and risk reduction); and (ii) ISO 45001 [22] (Occupational health and safety). 
The former specifies a risk analysis for machinery design. The second focuses on 
improving worker conditions and increasing employee safety by reducing work- 
place hazards. Both offer a framework for the control of risk factors, the mitigation 
of potential adverse hazards and the impact on the worker’s physical and mental 
condition. 

Last but not least, ISO/TS 15066 [23] (Robots and robotic devices — Collabo- 
rative robots) complements the ISO 10218 standard [24] (Safety Requirements for 
Industrial Robots) with focus on collaborative robots. Specifically, this TS describes 
4 collective operation techniques: 


e Safety-rated monitored stop 

e Hand guiding 

e Speed and separation monitoring 
e Power and force limiting 


Understanding this collaboration and knowing the risks is essential to avoid inci- 
dents resulting from robot-human contact. Maximum speed control, emergency 
stops, immediate contact stop and other technologies are some of the technologies 
applicable to industrial robots and especially to cobots. 


10.3.3 Other Relevant Standards 


Detailing all the relevant standards is beyond our scope, but there are several that 
are not specifically related to IA, safety, security or industrial environments, but are 
of interest from an industry point of view. 

For example, ISO 9000 family [25] on Quality management, a well-known 
framework for demonstrating that products and services meet customer quality 
requirements, has some impact on IA systems for industry. In fact, there are several 
studies [7, 8] that are studying the use of IA for the management of incomplete, 
conflicting data or the classification of audit findings. Therefore, information from 
IA or the use of IA to improve decision making can be useful in improving quality 
management. 

The commitment to responsibility and respect for society is covered in ISO 
26000 [26] (Social responsibility), another non-technical standard not strictly 
related to IA in manufacturing, but which somehow takes into account com- 
pany reputation, commitment and health. As mentioned earlier in the ergonomics 
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standards, the health and safety of workers, especially in collaboration with 
machines, is a point of interest for the ecosystem of safe and secure AI systems. 

In line with sustainability, mentioned above, we have ISO 14001 [27] (Envi- 
ronmental management systems). This standard, which like the previous one is not 
focused on manufacturing and Al, focuses on the introduction of environmental 
management systems for the improvement of sustainability and reducing potential 
conflicts related to the environment and even creating decent and healthy work 
environments [9]. 

Finally, it is worth mentioning a standard related to risk management in gen- 
eral. This standard is ISO 31000 [28] (Risk management), and is based on the 
evaluation and continuous optimization of the processes and how to manage risks. 
The standard, which targets the operational continuity of the business, takes into 
account, among other things, the safety of the outcomes or even the environmental 
reputation. 


10.4 Industrial Architectures and Infrastructures 


10.4.1 Reference Architecture Models 


The advent of Industry 4.0 and smart manufacturing has given rise to the specifica- 
tions of various architectures and reference models for developing digitally-enabled 
industrial systems, notably systems that leverage Cyber Physical Production Systems 
(CPPS). These models describe the structuring principles and the main building 
blocks of modern industrial systems. In most cases these reference models lead to 
industrial systems that fall in the realm of the Industrial Internet of Things (IoT). 
The latter include data-intensive components such as components based on Big- 
Data analytics and Machine Learning (ML). As such their structuring principles are 
relevant to the development of AI systems. Following paragraphs review some of 
the most popular reference models for architecting Industry 4.0 systems including 
Al systems. 


10.4.1.1 Reference Architecture Model Industrie 4.0 (RAMI 4.0) 


RAMI4.0 provides structuring concepts and a vocabulary for understanding Indus- 
try 4.0 systems and their deployment. RAMI describes the structure and main 
elements of Industry 4.0 system by means of a 3D layered model (Figure 10.1). 
The three layers of the 3D model correspond to: 


e The Architecture axis (Layers), which comprises six different layers indicating 
functionalities at different granularities of the system, from the asset to the 
business level. 
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Figure 10.1. Reference Architecture Model Industry 4.0 (RAMI 4.0). 


e The Process axis (Value Stream), which illustrates the stages of an asset’s life- 
cycle, along with a corresponding value creation process based on IEC 62890. 
e The Hierarchy axis (Hierarchy levels), which presents describe the breakdown 
structure of assembled components based on a taxonomy that starts from the 


product and goes up to the connected smart factory. The various levels are 
driven by the DIN EN 62264-1 and DIN EN 61512-1 standards. 


The architecture layers of RAMI4.0 include: 


e The Asset Layer, which describes physical systems and components (e.g., 
machines, motors, software applications, spare parts). 

e The Integration Layer, which links the physical and digital/cyber worlds 
based on components like drivers and middleware. 

e The Communication Layer, which deals with communications between 
the integration and information layers. It employs network protocols (e.g., 
TCP/IP, HTTP FTP) over LAN and WAN networks, including wireless net- 
works. 

e The Information Layer, which provides (digital) information about sales, pur- 
chase orders, suppliers, locations etc. along with information on materials, 
machines and components that support the production. 

e The Functional Layer, which comprises production rules, actions, processing, 
and system control. 

e The Business Layer, which is associated with the business strategy, the busi- 
ness environment, and business goals of the enterprise, including promotions, 
offers, pricing models and cost analysis. 


Process Value Streams 


The Process axis deals with the lifecycle and processes of an object, which typically 
comprises a product, physical entities (e.g., machines and spare parts) or even virtual 
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entities (e.g., documents and project plans). Every product needs to be updated, 
restructured, redesigned, or reformed for maintenance purposes. In this context, 
the process layer specifies Types and Instances as it main methods. A product in a 
development state is referred to as a “Type”. Once moved to production, it becomes 
an “Instance”. As illustrated in the RAMI4.0 cube, a “Type” is also subject to main- 
tenance activities. A product returns to the “Type” state, whenever it is redesigned, 
or a new feature is being added to it. 


Hierarchical Levels 


The hierarchy levels of the corresponding axis are as follows: (i) Product, which 
abstracts the product that is manufactured in a factory; (ii) Field device, such 
as sensor and electronic devices that capture and/or control data from the field; 
(iii) Control device, which corresponds to the Operational Technology (OT) that 
manages input and output. Prominent examples are PLCs (Programmable Logic 
Controllers) and DCSs (Distributed Control Systems); (iv) Station, which enables 
operators to coordinate several processes and monitoring the results, by means of 
automation systems such as SCADA; (v) Work Center, which keeps track of manu- 
facturing information and parameters that enable quality management; (vi) Enter- 
prise, which comprises the are core business processes (e.g., production planning, 
production scheduling, marketing and sales, financial modules) that are usually 
managed through an ERP system; (vii) Connected World, which deals with the 
interlinking of all stakeholders as part of their supply chain interactions (including 
information sharing and exchange among them). 


10.4.1.2 The Industrial Internet Consortium Reference Architecture 
QIRA) 


The IIRA specifies a common architecture framework for developing interoperable 
IoT systems for different vertical industries. It is an open, standards-based architec- 
ture, which has broad applicability. The latter makes is a vehicle for interoperability, 
mapping and practical deployment of IoT technologies, as well as standards devel- 
opment. To ensure its broad applicability, the IIRA is fairly generic, abstract and 
high-level. Hence, it can be used to drive the structuring principles of an IoT sys- 
tem, without however specifying its low-level implementation details. It is also a 
very good vehicle for communicating concepts and facilitating stakeholders col- 
laboration. 

Based on the analysis of multiple use cases in different sector, the IRA presents 
the structure of loT systems from four viewpoints, namely business, usage, func- 
tional and implementation viewpoints. Among these four viewpoints, it’s the func- 
tional viewpoint that specifies the functionalities of an IoT system. To this end, the 
functional viewpoints specifies distinct functionalities in the form of the so-called 
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Figure 10.2. Functional domains in the IIRA. 


“functional domains”. Functional domains (Figure 10.2) can be used to decom- 
pose an IoT systems in a set of important building blocks, which are applicable 
across different vertical domains and applications. As such functional domains are 
used to conceptualize concrete functional architectures. The IIRA decomposes a 
typical loT/IloT system into five functional domains, namely a control domain, 
an operations domain, an information domain, an application domain and a busi- 
ness domain as outlined in. The implementation viewpoint of the IIRA is based 
on a three-tier architecture, which follows the edge/cloud computing paradigm. It 
includes an edge, a platform and an enterprise tier. 


10.4.1.3 The OpenFog Reference Architecture (RA) 


The OpenFog Consortium was a consortium of high tech industrial enterprises 
companies and research/academic institutions, which are collaborating towards 
standardizing and promoting the fog computing paradigm. Since December 2018, 
the OpenFog Consortium and the IIC have joined forces.” Fog computing is 
directly associated with IoT, as it leverages fog nodes (i.e. essentially IoT devices) 
in order to enable reliable, low latency IoT applications. Fog computing alleviates 


10. https://www.iiconsortium.org/press-room/12-18-18.htm 
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the limitations and drawbacks of conventional cloud computing in various scenar- 
ios where low-latency and processing close to the field is required. The RA of the 
OpenFog consortium illustrates the structure of fog computing systems. It presents 
how fog nodes can be connected partially or fully to enhance the intelligence and 
operation of an IoT system. Moreover, it presents solutions about growing system 
wide intelligence away from low-level processing of raw data. The RA is described 
in terms of different views, including functional and deployment views. OpenFog 
compliant systems include some cross-cutting functionalities (i.e. functionalities 
that are applied across all layers of an loT/OpenFog system). These cross-cutting 
functionalities are conveniently called “perspectives”. 


10.4.1.4 ISO/IEC CD 30141 Internet of Things Reference Architecture 
(loT RA) 


ISO/IEC 30141:2018"' provides a standardized IoT Reference Architecture using 
a common vocabulary, reusable designs and industry best practices. It uses a top 
down approach, beginning with collecting the most important characteristics of 
IoT, abstracting those into a generic IoT Conceptual Model (CM). The latter has 
been derived based on a heuristic analysis of system characteristics that are common 
in most loT systems (e.g., auto-configuration, discoverability, scalability, etc.). The 
CM describes typical IoT entities or actors, along with their relationships. The 
architecture is described by means of five complementary views i.e. functional, sys- 
tem, communications, information and usage. 


10.4.1.5 BigData Value Reference Model 


The BDV Reference Model provides the means for representing Al, ML, and 
BigData analytics pipelines [29]. It distinguishes between two different elements: 
(i) Elements that are at the core of the BDVA (Big Data Value Association); and 
(ii) Features that are developed in strong collaboration with related European activ- 
ities. The model is structured into horizontal and vertical concerns: 


e Horizontal concerns focus on the data processing chain, starting with data 
collection and ingestion, and extending to data visualisation. Horizontal con- 
cerns do not imply a layered architecture. For instance, visualisation may be 
applied directly to collected data without the need for intermediate functions 
like data processing and analytics. 

e Vertical concerns address cross-cutting issues that apply to all horizontal 
functions and may include non-technical aspects. 


11. https://www.iso.org/standard/65695.html 
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Figure 10.3. BDVA Reference Architecture Model for BigData Analytics and Machine 
Learning [29]. 
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The BDV Reference Model (Figure 10.3) is compatible with reference archi- 
tectures for Al, most notably to the ISO JTC1 WG9 Big Data Reference 
Architecture.'* 


10.4.2 Infrastructures for Industrial Systems 


As far as infrastructures are concerned, we can mention GAIA-X” (a Feder- 
ated Data Infrastructure for Europe). GAIA-X is an initiative launched by rep- 
resentatives from business, science and politics on a European level to create 
a proposal for the next generation of a European data infrastructure and thus 
enable EU companies to compete globally, exploiting data and services made 
available in an open digital and trusted ecosystem. GAIA-X connects centralised 
and decentralised infrastructures in order to turn them into a homogeneous, 
user-friendly system. The resulting federated form of data infrastructure strength- 
ens the ability to both access and share data securely and confidently. Specifi- 
cally for the Industry4.0 sector, GAIA-X infrastructure opens opportunities for 
development of new solutions for Smart Manufacturing, Supply Chain col- 
laboration, Shared Production, Predictive Maintenance, Connected ShopFloor 
and more. 


12. _https://www.iso.org/files/live/sites/isoorg/files/developing_standards/docs/en/big_data_report-jtcl .pdf 
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10.5 Conclusions 


The purpose of this chapter was twofold: on the one hand, to review some of the 
standards and architectures applicable to both, use cases and pilots, applicable to 
the manufacturing industry, that are being investigated in research and engineering 
projects, such as STAR. On the other hand, to provide a series of pointers and a 
brief overview of the current ecosystem of standards, so that those SMEs that want 
to implement safe and secure AI systems, either with or without middlewares such 
as STAR, can find some initial information. 

Standards advance, are updated and new ones appear, that’s why it is interesting 
to be aligned with the standards of suppliers and customers, and to perform periodic 
watch activities. For example, in the short-medium term there are several standards 
under development that may be of interest for the development of AI applications 
in the industry. Several of those included in /SO/IEC JTC 1/SC 42 Artificial intel- 
ligence,“ such as ISO/IEC AWI TR 5469 Artificial intelligence — Functional safety 
and Al systems,” or ISO/IEC DTR 24027 Information technology — Artificial Intel- 


16 


ligence — Bias in AI systems and AI aided decision making,'° may have a high impact 


on the future AI for safe and secure systems. 
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Chapter 11 


Al That Works: The Symbiosis 
of Functionals & Non-Functionals 
as Main Success Factor 


By Arthur van der Wees, Anna Ida Hudig and Celine Prins 


With any emerging technology, or combination of existing technologies, one tends 
to focus on the technology itself. However, the technology should not be the focal 
point as it in itself is not the solution. This also goes for Artificial Intelligence and 
the promising functionalities and capabilities it can or otherwise promises to bring, 
enable, facilitate and augment. For instance in the vast supply chains, manufac- 
turing, logistics, maintenance and related Industry 5.0 domains. This chapter will 
present notions and guidance to make AI work; not just function but also to have 
it prepared by design with embedded non-functionals for when things may go 
wrong and other risks it may encounter or cause. All this for AI to help making 
‘it work. This tiered approach provides value propositions that effectively address 
societal challenges, for which relevant AI functionalities in symbiosis with risk- 
based non-functionalities can be designed, deployed and continuously improved. 
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In the Industry 5.0 domain this approach is aimed to result into valuable and feasi- 
ble, human-centric, secure, safe, sustainable and otherwise trusted and trustworthy 
Al-supported intelligent ecosystems. With that, the symbiotic, dynamic equation of 
both functionals and non-functionals is one of the main success factors for future- 
proof Industry 5.0 and related value creation. 


11.1 Introduction 


11.1.1 How to Make it Work? 


Where this chapter mainly aims to provide guidance in making Artificial Intelli- 
gence work, in order to get there it is important to first understand how to make 
‘it’ work — and what ‘it’ actually means —. 


11.1.2 Everything is Connected 


Alexander Von Humboldt, the 18th-century scientist, naturalist and explorer, 
world famous in his time, was one of the first to recognize and explain the fun- 
damental functions of the mountains, rivers and rain forests for the ecosystem 
and climate, claiming that the world is a single intertwined and interconnected 
organism. 

Everything is connected; everything is part of the system, he basically 
acclaimed [1]. This is the concept of nature as we know it today. According to Von 
Humboldt, everything, to the smallest creature, has its role and together makes the 
whole, in which humankind is just one small part of the holistic puzzle. 

Integrated ecosystems sustain life and provide us with an amazing habitat. People 
and the ecosystems we live in, in this Digital Age, have great capabilities to improve 
and sustain the quality of life for all. If we interact and leave no one behind. As we 
face and urgently need to deal with many societal challenges, we need a climate 
for change. For these, Artificial Intelligence (AI) and other or related knowledge, 
processes, technologies, human intelligence and experience may be an excellent 
enabler and facilitator. 


11.2 Where to Start? 


11.2.1 Societal Challenges 


Where this chapter is not about covering and discussing societal challenges in gen- 
eral, one needs valued use cases that address any or multiple of those challenges. 
This, as technology in itself is not a use case. It never is. The way forward is to have 
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intertwined Societal Challenges 


It is not hard to make decisions once you know what the various values are 


Future of Living 


Figure 11.1. Intertwined societal challenges. 


an use case-driven, people-centric, stakeholders-centric, persona-centric, societal- 
centric, data-centric, sustainable, technology-agnostic and accountable approach. 

But, where to start? What can an entrepreneur, company, sector, community or 
other groups in society and economy do to create overall positive, green, digital 
and resilient impact while also having a viable and economically sustainable value 
model, with related business models and (financial and other) feasibility models to 
get things both started, going, trusted, growing, scaling, resilient and future-proof? 
Having a big vision and focusing on the horizon is important, but having a clear 
starting point is one of the main prerequisite success factors. 

With that in mind, it is recommended to start with identifying and establishing 
the particular challenge(s) one would like to focus on, for instance by using the 
12 Societal Challenges for Future of Living [2], as visualised below in Figure 11.1. 
These are in line with both the vision of the European Commission as well as 
the United Nations’ Sustainable Development Goals (SDGs) [3]. These Societal 
Challenges are obviously intertwined and interconnected. 

When analysing the various Societal Challenges that relate to the vast supply 
chains, manufacturing, logistics, maintenance and related Industry 5.0 domains [4] 
in combination with digital ecosystems with certain AI capabilities or potential 
anywhere upstream, midstream or downstream in the Industry 5.0 ecosystems, at 
least the following are notable to be considered: 


SC1: Abundance & Scarcity 
SC2: Circular Economy 

SC3: Climate & Sustainability 
SC4: Demography 
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SC7: Inclusion 

SC8: Mobility & Logistics 

SC9: Resilience (Climate, Community & Cyber) 
SC10: Safety & Security 

SC11: Skills & Jobs 


As two examples, the next paragraphs will briefly dive into the SC4: Demogra- 
phy, and SC11: Skills & Jobs, and where and why AI in Industry 5.0 context may 
be valuable, appreciated and even necessary. 

It will also demonstrate that each Societal Challenge is both complex in itself as 
well as intertwined with the other Societal Challenges, where addressing one also 
will result in addressing or otherwise impacting parts of others. 


11.2.1.1 H2M & M2H cooperation 


When focusing on the Societal Challenge of Demography, three questions that 
comes to mind are (a) how to deal with an expected decrease of population in 
multiple parts of Europe, (b) what will the various combinations and interfaces 
between humans and machines, and vice versa, look like (H2M, M2H, H2M2M 
and the like), and (c) will we see social prosperity or social disruption? 

Within the European Union, there is a decline in working-age population. 
It's expected to reduce by 13.5 million (or 4%) by 2030 compared to 2018 [5]. 
This, as the EU population size will shrink by 5% between 2019 and 2070, to 424 
million inhabitants [6]. Furthermore, the EU’s demographic ratio between people 
above 65 years old and those aged 20-64 is expected to increase from a one to four 
ratio 2010, to a one to (less than) two in 2070 [7]. Additionally, the development 
of shorter working weeks could cause a 2% reduction in labour supply [8]. 

So, more productivity and efficiency is expected from less [9]. The develop- 
ments will affect the per capita gross domestic product (GDP) but also welfare 
and the quality of life. Just keeping the current status quo in place will be a huge 
challenge. According to a Working Paper of the Organisation for Economic Coop- 
eration and Development (OECD) on ageing and productivity growth, in many 
OECD regions the actual growth rates recorded have been lower than productivity 
growth required to maintain per capita GDP levels in recent years. One reason for 
this is that ageing also has a direct negative impact on productivity growth, with 
the effect being concentrated in urban areas [10]. 

Combining and deploying innovative processes, data and technologies to aug- 
ment the capabilities of people, industry, supply side and demand side can be help- 
ful mechanisms to compensate this expects decrease in productivity and levels of 
welfare and quality of life. 

The above does not only demonstrate that are huge potential and markets for AI, 
intelligent systems, cognitive computing, robotic process automation, distributed 
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intelligence, autonomous systems, cobots, and other or related knowledge, pro- 
cesses, technologies and experience. It also demonstrates that there is a need for 
Al- and other technology-supported H2M, M2H, H2M2M and other interac- 
tion, communication and cooperation to help address the current and upcoming 
challenges, avoid social disruption, and improve social prosperity. 


11.2.1.2 Evolution or revolution? 


When focusing on the Societal Challenge of Skills & Jobs, three questions that 
come to mind are (i) how will the future of work change the industrial sector, and 
the looks of our urban and rural societies, (ii) how to keep the veins of trade and 
human values running through our communities, and (iii) will technology displace 
more jobs in 10 years than it creates, or vice versa? 

According to the OECD, 65% of the kids in schools today will have jobs that 
haven't been invented yet [11]. This indicates that we apparently are not yet sure 
what the future will look like, but that we do for sure acknowledge society will 
look very differently in a decade. The World Economic Forum (WEF) points out 
that among the top 10 most essential skills of the near future are: analytical think- 
ing, empathy, creativity, reasoning, complex problem-solving, self-management, 
and technology development and use [12]. Clearly, this list resembles a more inter- 
twined combination of both the right part of the brain with the left part, than 
currently commonly seems the case. 

Intelligent supply chains, rapid innovation production, integrated logistics, 
prognostic health monitoring, predictive maintenance and other Industry 5.0 
domains have the capabilities to address Societal Challenges and improve produc- 
tivity, safety, security, sustainability and other efficiencies [13]. New concepts, mod- 
els and processes supported with AI and other digital capabilities are not a nice to 
have; they are a need to have. For sure it will also both support and augment the 
workforce, yet it will also challenge and change it, in an evolutionary or revolution- 
ary way. 

However, Societal Challenges and related SDGs are challenging and complex 
problem sets. There is no one solution. There is no one entrepreneur, no one cor- 
poration, nor one other group with the answer. There is no one technical fixture. 
Nor will there be one AI fixture. This is all about working together. Each challenge 
requires diverse teams and capabilities. Nothing less. This is all about walking and 
achieving outcomes. 


11.2.2 People, Process & Technology 


The current real-life world in this Digital Age seems complex than ever. It is 
and will be more and more the symbiosis of physical, physical-cyber, cyber and 
cyber-physical worlds, with ever-increasing capabilities and possibilities. Therefore, 
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Figure 11.2. Tetrahedron: people, process, technology & data. 


it is important to identify, map and plot one of the all-present common denomi- 
nator in this Digital Age. 

Also for the European Commission, from the digital perspective, the common 
denominator and the main priority is: data [14]. The data dimension is the dynamic 
and all-present dimension that is relevant everywhere in this Digital Age. It offers 
huge opportunities, benefits and gains. 

Nonetheless, in the generic structure of ‘people, process, technology’, the com- 
ponent ‘data’ is generally still overseen. Yet in the AI domain it is obviously one 
of the main ingredients and enablers. Data, structured data (being information), 
combined information (being knowledge), and used data, information and knowl- 
edge (being experience) in all its forms and categories and with all its various values, 
will generally run through all the activities and ecosystems, from end to end. The 
time has come that we all move away from just thinking in the traditional, and 
long-outdated, mode of ‘people, process and technology’. 

For addressing any Societal Challenges, one will need a data strategy, which we 
always need to take it in any equation, visualised below in Figure 11.2. 


11.3 Make it Work 


11.3.1 Make it Function 


Technology changes the world at a fast pace. Thirty years ago the internet became 
publicly available through the World Wide Web. It was not designed. It just 


Make it Work 197 


Digital Ecosystems: Technical Stack + Data 


Figure 11.3. Digital ecosystems are interconnected vessels. 


happened and evolved. Meanwhile, it has fundamentally changed the world as we 
then knew it. Today we see more than 50% of the world’s population have the 
ability to access and use these digital technologies and networks. And the number 
continues to increase, every day. 

Societies and individuals can benefit in all manner of ways from access to knowl- 
edge, people and organizations on a local and global level. More than that, digital 
has become a must-have, for people, society and the economy. Indeed, digital tech- 
nology and networks foster innovation. Digital platforms, AI, robotics, edge com- 
puting and the internet of things (IoT) are further expediting this process by con- 
necting, inter-connecting respectively hyper-connecting individuals, organizations, 
communities, societies and data, with tens of billions of objects and entities. 

All these technical capabilities and related digital ecosystems generally comprise 
of a technical stack that to some extent can be visualized as set forth below in 
Figure 11.3. These are made up of some combination of the various forms of data 
together with software-enabled algorithms that have sufficient computing power 
either centralized, decentralized or distributed on the Edge or in IoT devices, and 
interfaces, connectivity and infrastructure where necessary. 

So, with the clear mission to address Societal Challenges one has done the 
initial preparations to make it work. When furthermore preparing the relevant 
kitchen tools, cooking ingredients, basic cooking skills and a plan what to cook, 
one can come up with the technical functions, and the functional specification, 
technical requirements, the technical specification, and thereafter the actual devel- 
opment and engineering. Right after, it is time to demonstrate it functions, and 
one is all set. Right? We all know how difficult it already is to even come to that 
point. 
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11.3.2 Does It Work? Or Does It Just Function? 


However; does it actually work? Or does it just function? What if it does not func- 
tion? 

AI technology is an inherent component of Industry 5.0. However; even if the 
technology itself may be at the right technical readiness level, the readiness of a 
technology on itself, that is, whether it has been proven to be well-functioning 
in an operational environment, does not guarantee its success [15]. Studies show 
that adding AI to a technology or process could strengthen its capacity to reach the 
envisioned outcome, yet it will just as well amplify the risk for negative impact. Dig- 
ital technologies and intelligent networks are not immune to error, evil, incidents 
or other risk. These are also not immune to incidental, incremental or disruptive 
change, either caused by internal or external factors. The many “What-If’ scenarios 
are generally not considered sufficiently, and not re-run after in a consistent and on 
continuous basis. 

Making it work, implies having both the functionals as well as non-functionals 
included, by design and by default, and taken into consideration — and address- 
ing those — end-to-end; both upstream, midstream and downstream, in the Von 
Humboldt spirit. 

Although new and seemingly burdensome for some, it will for sure be benefi- 
cial in order to truly make it work, with AI in the equation. Before one notices, it 
will become second nature. The ‘it’, in ‘make it work’ is not AI or other techno- 
logical functionalities or capabilities; it is a valued use case that addresses Societal 


Challenges of any kind. 


11.3.3 Risk in Cyber-Physical and Other Digital Ecosystems 


11.3.3.1 Risk in the digital age 


Where this chapter is not aimed to give a full overview and perspectives of risk, 
risk mitigation and risk management, it is important not to see risk as something 
necessarily negative. It is an integral part of the equation and with that an enabler 
and facilitator of anything that works in a trusted, trustworthy and accountable 
way. It gives essential and valuable insights in what may happen or may go wrong, 
what people or society like or fear, et cetera. For sure, in the AI or Al-supported 
domain that is an essential success factor. 

The magnitude of risks, determined by the probability as well as the impact 
thereof, is very much context and application dependent. To prepare for and mit- 
igate the potential harm, to embed preparedness for foreseen and unforeseen sit- 
uations, and to make it resilient and future-proof, it is necessary that AI systems 
are designed and deployed guided by trust principles. These non-functionals are 
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principles that consistently preserve trust, trustworthiness and engagement of all 
relevant stakeholders. Examples of such principles are security, safety, privacy, trans- 
parency, auditability, sustainability and robustness. There are several hundred of 
trust principles. These can be found in best practices, guidelines, white papers, 
standards, regulations but also in common practice and nature. 

Two major challenges in the AI design and deployment are (1) to map the rele- 
vant risks accurately and comprehensively throughout the system’s entire lifecycle, 
and (2) to incorporate non-functionals by design. 


11.3.3.2 Risk segmentation; creating insights & oversight 


Risk is not a four-letter word, and — even in the AI context — deserves its own series 
of books. It is at least useful to segment the various Al-related dimensions of this 
Digital Age in order to get some relevant oversight and insight. For purposes of 
this book in general and this chapter in particular that argues for a holistic, end- 
to-end ecosystem approach, similar to the notions of Von Humboldt, the initial 
segmentation however done in four (4) segments as set forth below: 


A. Non-connected, which is a stand-alone device, tool, machine, appliance or 
application that does not have connectors or connectivity that can connect 
to the internet or other external network or resources. 

B. Connected, where a device, tool, machine, appliance, application or system 
may be connected to, via the internet, a centralised databases, cloud infras- 
tructure and other centralised systems; 

C. Inter-connected, where several edge devices, tools, machines, appliances, 
applications or systems are connected with each other, either via orchestrated, 
federated systems, and; 

D. Hyper-connected, where numerous far edge and other IoT devices, tools, 
machines, appliances, applications or systems are directly connected with 
each other via distributed (computing and related) ecosystems of ecosystems. 


For each of these segments, various value cases, business models, feasibility mod- 
els and therefore use cases can be identified and created in the Al-supported Indus- 
try 5.0 domain. Each segment has its own values, benefits, efficiencies, inefficien- 
cies, et cetera. 

The segmentation set above obviously is not the only one possible. Various 
other segmentations are relevant to consider as well, such as for instance real- 
time, near-real-time or not. This segmentation may be relevant when near-real-time 
autonomous 3D printing is considered, or real-time prognostic health monitoring 
or related integrated logistics support are relevant. Other segmentations that can be 
considered are single-vendor, multi-vendor, OEM, public, private, public-private, 
et cetera. 
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Figure 11.4. Cyber-physical ecosystem security risk spectra. 


11.3.3.3 Risk classification spectra: a multi-layered approach 


When going back to the above-mentioned segment, Hyper-Connected devices, and 
taking a risk-perspective to those, a methodology to do high-level quality risk clas- 
sification is to have a multi-layered approach and do such risk classification per 
spectrum, starting with the risk classification of the connectors and connectivity of 
the IoT device itself. Even though AI capabilities may not yet be in the equation, 
it is essential to understand the various risks that are embedded in or could arise 
from such a IoT device. Subsequently, other risk spectra should be considered and 
risk classified, as visualised below in Figure 11.4. 

Especially more downstream there may be risk spectra that may not be relevant; 
however, if such spectrum may become relevant later in the life cycle of the IoT 
device it is recommendable to keep it in and already do the spectrum risk clas- 
sification. In general, three categories of main risk levels are used: low, medium 
and high. Based on the outcome of (i) a risk classification for each spectrum, and 
(ii) the interim outcome of the various risk classifications up to Spectrum 13 (AI 
Capabilities), the baseline risk classification can be established. 

Based on that baseline, the AI Capabilities risk classification can be done, 
and the subsequent risk spectra; the holistic perspective constitutes the Com- 
bined Risk Classification, on which one can consider and organise technical & 
organisational security, safety, privacy and related technical and organisational 
measures. 

Any technical and organisational measures taken or to be taken can include, 
cause or otherwise trigger risk by itself or as a trigger consequence. It is therefore 
recommended to double-loop the particular set of measures, for once to initially 
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assess if and to what extent these may have a detrimental impact, for which in the 
subsequent section ‘Double Looped S.I.M.’ will provide with a practical and proven 
methodology. 

As per the dynamics of IoT — and even more so Al-supported IoT and IoT 
ecosystems —, any of the risk classification spectra can be expected to trigger, change 
or otherwise show relevant dynamics, such as (A) technical or other threats and 
vulnerabilities, (B) actors and other stakeholders anomalies, updates or upgrades in 
code, datasets or attributes, or (C) changes in regulatory standards, policies or other 
relevant best practices, it is recommended to double-loop as well, including those 
spectra that are or may be related or otherwise are (inter)depended on the particular 
spectra. Therefore, it is recommended to continuously monitor the risks, and where 
necessary or otherwise double-loop thereafter to keep the security measures up to 
date and resilient. 

In any case, the segments, whether non-connected, connected, inter-connected 
or hyper-connected, that have AI capabilities of any kind, are for sure game 
changing, where non-functional and functional requirements have to be addressed 
together. The winner will be the one who understands fully the societal challenges 
at hand and related sectoral requirements. 


11.3.4 Good-Case, Bad-Case & Worst-Case Scenarios 


11.3.41 Dynamic scenarios; identify, structure, act & double-loop 


The world is not perfect. Nothing is. Nature and its dynamics are used to that, and 
so are humans — even though this is forgotten once in a while. Every professional 
and every other person has a lot of individual capabilities to assess risks, including 
probability, potential impact and severity thereof. 

However, with an adequate number of individuals coming from diverse groups 
of people with diverse backgrounds, knowledge, skills and expertise, one can do 
even more comprehensive risk assessments. Multiple and various brain power leads 
to more perspectives, angles, understanding about interdependencies and other 
insights. This is very necessary as well, as the (i) ever-changing and ever-evolving 
systems and attack surfaces, and (ii) assessing risk and making informed decisions 
to choose, implement and keeping up to date the right set of technical and organ- 
isational measures, become more and more complex as well. 

Risk is generally linked to accidents. These can have a low probability, but when 
these happen it can result in high impact and can even trigger multiple severe exter- 
nal consequences. In his well-known book Normal Accidents, Perrow uses the term 
‘normal accidents’ in part as a synonym for ‘inevitable accidents.’ This categoriza- 
tion is based on a combination of features of such systems: interactive complexity 
and tight coupling. Normal accidents in a particular system may be common or 
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Double-Looped S.1.M.: Scenarios, Impact & Measures 
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Figure 11.5. Double-Looped S.I.M. 


rare, but the system’s characteristics make it inherently vulnerable to such accidents, 
hence their description as ‘normal’ [16]. 

However, risk is not just linked to accidents. It can also be a consequence 
of action, inaction, error, omission, ignorance, stupidity, Recklessness, intention, 
malicious or unintended but foreseeable consequence. 


11.3.4.2  Double-looped scenario plotting 


To consider risk in order to take technical and organisational measures by design, 
the scenario plotting methodology of Double-Looped S.I.M. can be used, prefer- 
ably together with diverse groups of people with diverse expertise, and at differ- 
ent moments in time and different times during the day. S.I.M. means: Scenario, 
Impact, Measures. The double-looping refers to the notion that any measure in 
itself can be a vulnerability and can even increase risk or create new risk and related 
detrimental impact. So, every measure deserves its own S.I.M. cycle. The Double- 
Looped S.I.M. can be visualised as set forth below in Figure 11.5. 

One will probably find numerous scenarios one would not immediately think 
about, as generally the worst-case scenarios get the most attention even though 
the probability may be close to 0%. Good case scenarios generally are forgotten, 
although they may cause (intially unforeseen) impact and negative consquences as 
well. With the nowadays familiar race to try to be the first in a market, the risks of 
AI and its functions and applications that have been designed, seemingly ‘for good’, 
may have severe negative societal, safety, security, personal, economical, ecological 
and other risks, impact and consequences. The ones that create AI are humans 
and generally working with a certain focus and under certain pressure (including 
by its investors, grant providers and others), while not considering or allowed to 
consider other perspectives. Furthermore, new or emerging technology tends to 
be overconfident. In the case of AI capabilities, even the AI may be overconfident 


itself [17]. 
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Other scenario examples can — and should — be taken from real life accidents 
and other lessons learned. Such as the story of Wanda Holbrook, a maintenance 
technician and who was killed by an unexpectedly moving robot at a car parts man- 
ufacturer site [18]. Unfortunately, the non-functionals — in this case personal safety 
and security — was not taken into the symbiotic equation, and the total imbalance 
became awfully clear. 


11.3.4.3 Balancing out functionals & non-functionals 


The question ‘what happens if things go wrong?’ is not one most designers, devel- 
opers or marketeers wish to ask themselves. Even more, in the AI domain it is 
expected that incidents will have an even more severe impact than in the digital 
domains without AI capabilities. These notions also go for any emerging or rela- 
tively new technical capabilities; not only for the AI domain. Good and extensive 
scenario plotting and mapping are prerequisite, also from ethical and accountability 
perspectives. 

The appropriate balance between functionalities and benefits on the one hand, 
and non-functionals and impact-mitigation on the other hand, with appropriate 
security and other prevention-, risk- and impact-based measures, metrics and mea- 
surements in place will need to be found per context, and meanwhile monitored and 
challenged continuously. It will increase transparency, reduce unpleasant surprises 
in the Digital Age, and most of all increase trust and trustworthiness. Making it 
work, including the appropriate functionals, non-functionals and related account- 
ability, is complex but that is where the true huge potential is, for all, and the future 
of mankind and our planet. 


11.3.5 Human-Centric Co-Creation Cycle: Success by Design 


11.3.5.1 Cat & Mouse 


Same as in cat and mouse games, malicious actors immediately change and improve 
their ways as soon as they are countered. In AI but also any other technology or dig- 
ital ecosystem, the eternal cat and mouse game will continue, increase and expedite. 
‘Al for Good’ can easily be converted into ‘AI for Malicious’, and vice versa. There- 
fore, future networks will indeed be smarter and safer, whilst at the same time those 
networks will be more vulnerable. This race will not be a sprint; it will be a per- 
manent marathon with an unknown number of sprints. In the Digital Age, these 
eternal games will continue, increase in dynamics and speed, and otherwise also 
expand exponentially. 

In order to aim to identify risk and avoid that impact is mitigated or contained 
and vulnerabilities are not misused, the first focus should be on trying to avoid 
that there are risks and vulnerabilities in the first place, preferably by design and in 
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a continuous manner. This is also not a task or responsibility of one person, one 
department or one organisation. No one can do this alone. For this, the Human- 
Centric Co-Creation Cycle methodology was developed, validated and deployed 
worldwide. 


11.3.5.2 Co-creation cycle: multi-disciplinary & inter-disciplinary 


The Co-Creation Cycle is an aid that identifies the various functionals and non- 
functionals that are relevant in a particular design, development, manufacturing, 
logistics, monitoring, maintenance or subsequent deployment phases. It helps iden- 
tifying the various expert stakeholders that should be part of the team in order to 
both find, balance out, arbitrate, document and optimize a symbiosis of functionals 
and non-functionala that is feasible from technical, operational, economical, eco- 
logical, financial, ethical and legal perspectives, as well as otherwise acceptable for 
all the team members. It furthermore demonstrates that both a multi-disciplinary 
and inter-disciplinary mindset and skillset is essential to make it work. 

The Human-Centric Co-Creation Cycle visualised below in Figure 11.6 pro- 
vides for an example where — after identifying the envisioned functionality and 
related interfaces — non-functionals such as security, safety, authentication, non- 
personal and personal data control, processing, protection, management and ana- 
lytics need to be part of the symbiotic equation by design by design. If the set of 
desired functionals and relevant non-functionals end up being too expensive, too 
unsustainable or otherwise not feasible, the cycle is repeated. It can happen that 
is needs to be repeated multiple times before — finally — the dynamic symbiotic 
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Figure 11.6. Human-centric co-creative cycle. 
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equation has been established that is deemed — by all stakeholders involved — to be 
feasible and acceptable for the entire life cycle. 

This will be a main success factor in any use case, application or deployment, 
if considered and included (a) by default by design upstream, (b) by default at 
engineering, assembly, implementation, making available midstream, and (c) by 
default before and after intended use, expected use and actual use downstream, 
during its whole life cycle. 

Certain AI capabilities can support and facilitate risk mitigation for sure and 
have a bright future ahead, although there will be no single silver bullets if done in a 
silo-ed approach. These however can support the above-mentioned diverse groups 
of people, by adding diverse groups of machines, algorithms and capabilities to 
identify, address societal challenges, find and optimize the right symbiosis of func- 
tionals and non-functionals, and to support making and executing well-balanced 
and well-informed decisions. 

Furthermore, it is about double-looping and otherwise optimizing the symbi- 
otic equation with lessons-learned. This will for sure be necessary, both as per the 
dynamics as mentioned earlier, as well as it will not be easy to make the symbiotic 
equation quantitative. This, as per the numerous qualitative qualifiers and condi- 
tions that can not always easily be converted into qualitative quantifications for use 
in this Digital Age. 


111.3.5.3 Accountability in the digital age 


With this, the initial fundaments of accountability have been laid as well. Account- 
ability is not an afterthought dealt with after something goes wrong. It is an essen- 
tial requirement, both before one acts as well as during and after. Accountability 
is about owning and co-owning roles and responsibilities, finding solutions, mak- 
ing things happen, and to helping out if things may go wrong once in a while. 
Accountability also cater for becoming or being compliant to relevant ethics, stan- 
dards and other applicable policy and legal frameworks. Regarding policy instru- 
ments and initiatives in the European Union associated with AI, Industry 5.0 and 
related domains and topics, reference is made to the last section of this chapter. 
In any case, accountability is also not about blaming others. This also as blaming 
means giving up the power of change. And change is the only constant, also in this 
highly-dynamic Digital Age. 


11.4 Conclusion 


Alis not about AI. It is about figuring out and helping out addressing challenges and 
achieving objectives that matter. For that, identifying the main Societal Challenges 
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to be addressed, together with intertwined or other liaised Societal Challenges, is the 
best starting point. As per current developments and expectations, AI capabilities 
will be necessary as an essential component to cater for addressing these challenges, 
add substantial and meaningful value and making it work. AI capabilities will also 
be necessary to help enabling and facilitating the achieving of Societal Challenges 
that are set — and agreed upon by respective nations — in the 2030 Sustainable 
Development Agenda of the UN and related SDGs, as well as 2050 Paris Treaty [19] 
and 2050 Green Deal [20] goals to achieving net-zero and even net-negative CO2 
emission. 

However, the same as with human intelligence, the numerous functionalities 
thereof do not guarantee that it will work. Both in human intelligence itself as well 
as in the use and deployment thereof are non-functionals to be taken in, by traumas, 
by education, by previous experience, by predictive capabilities and otherwise by 
nature. Things will go wrong, will be manipulated, may seem to go well where they 
have initially unexpected detrimental consequences, decrease or evaporate trust, et 
cetera. One need to make sure these incidents, accidents or other events do not 
happen, and — if they do — consequences, impact and other risk have been mitigated 
by design. Non-functionals are essentials and key enablers, not problems. 

Non-functionalities are as important as functionalities. Even better, they posi- 
tively augment each other if balanced out intelligently and correctly. The symbiosis 
of both is a main success factor for any development and deployment of AI. For 
sure, Industry 5.0 and related ecosystems, including the persons, organisations and 
other stakeholders therein, can benefit from this, and can improve itself towards 
human-centric, secure, safe, sustainable, trusted, trustworthy, resilient and other- 
wise future-proof systems. 

With the sequence of notions and guidance described in this chapter one can 
better develop AI capabilities, and come up with the appropriate dynamic symbiotic 
equation. That equation at the end of the day equals to: the principle of no surprises. 
Nobody likes unpleasant ones. AI that works, is AI that makes it work. Without 
surprises. 


11.5 Relevant Policy Instruments & Initiatives 


[1] Clean energy for all Europeans package: https://ec.europa.eu/energy/topics 
/energy-strategy/clean-energy-all-europeans_en 

[2] Climate & Energy Framework 2030: https://ec.europa.eu/clima/policies/st 
rategies/2030_en 

[3] Climate Neutral Economy by 2050: https://ec.europa.eu/clima/policies/stra 
tegies/2050_en 
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Coordinated Plan on Artificial Intelligence 2021 Review: https://digital-stra 
tegy.ec.europa.eu/en/library/coordinated-plan-artificial-intelligence-202 1- 
review 

Cybersecurity Act: https://eur-lex.europa.eu/eli/reg/2019/88 1/oj 

Data Strategy: https://ec.europa.eu/inf o/sites/info/files/communication-eur 
opean-strategy-data- 1 9feb2020_en.pdf 

Digital Compass: the European way for the digital decade: https://ec.europ 
a.eu/inf o/strategy/priorities-2019-2024/europe-fit-digital-age/europes-dig 
ital-decade-digital-targets-2030_en 

Digital Services Act package: https://digital-strategy.ec.europa.eu/en/policie 
s/digital-services-act- package 

eIDAS: http://data.europa.eu/eli/reg/2014/910/oj 

ePrivacy Directive: http://data.europa.eu/eli/dir/2002/58/oj 

ePrivacy Regulation: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri 
=CELEX%3A52017PC0010 


] EU Cybersecurity Strategy: https://ec.europa.eu/commission/presscorner/de 


tail/en/IP_20_2391 


] EU Green Public Procurement Criteria: https://ec.europa.eu/environment/g 


pp/case_group_en.htm 


] EU Security Union Strategy: https://ec.europa.eu/commission/presscorner/ 


detail/en/ip_20_1379 


] European Commission’s President Ursula von der Leyen welcoming the 


Recovery Plan and the Multiannual Financial Framework: https://ec.europa. 
eu/commission/presscorner/api/files/document/print/en/ip_20_2073/IP_2 
0_2073_EN. pdf 

European Commission Work Programme 2021: https://ec.europa.eu/info/si 
tes/info/files/2021_commission_work_programme_en.pdf 

European Digital Strategy: https://ec.europa.eu/digital-single-market/en/co 
ntent/european-digital-strategy 

European Industrial Technology Roadmap for the Next Generation Cloud- 
Edge Offering: https://european-champions.org/2021/05/10/european-ind 
ustrial-technology-roadmap-for-the-next-generation-cloud-edge-of fering/ 
General Data Protection Regulation: http://data.europa.eu/eli/reg/20 16/67 
9/oj 

Industry 5.0: Towards a Sustainable, Human-Centric and Resilient European 
Industry: https://ec.europa.eu/info/news/industry-50-towards-more-sustai 
nable- resilient-and-human-centric-industry-2021-jan-07_en 

IoT Security & Privacy; Final Report European Commission of 13 January 
2017 Workshop on Internet of Things Privacy and Security: https://ec.eur 
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opa.eu/digital-single- market/en/news/internet-things- privacy-security-wo 
rkshops-report 

[22] Katowice Climate Package: https://unfccc.int/process-and-meetings/the-pa 
ris-agreement/the-katowice-climate-package/katowice-climate-package 

[23] Machinery Directive: http://data.europa.eu/eli/dit/2006/42/oj 

[24] Next-Generation IoT & Edge Computing Strategy Forum: https://digital-st 
rategy.ec.europa.eu/en/events/next-generation-iot-and-edge-computing-str 
ategy-forum 

[25] NIS Directive: http://data.europa.eu/eli/dir/2016/1148/oj 

[26] Paris Agreement: https://ec.europa.eu/clima/policies/international/negotiat 
ions/paris_en 

[27] Proposal for a Regulation laying down harmonised rules on artificial intelli- 
gence (Artificial Intelligence Act): https://eur-lex.europa.eu/legal-content/ 
EN/TXT/?uris CELLAR%3Ac0649735-a372-1 leb-9585-01aa75ed71al 

[28] Proposal for a Regulation of the European Parliament and of the Council on 
machinery products: https://eur-lex.europa.eu/legal-content/DA/TXT/?uri 
=COM%3A2021%3A202%3AFIN&qid=1518252661475 

[29] Proposal for a Regulation on European Data Governance (Data Governance 
Act): https://eur-lex.europa.eu/legal-content/EN/TXT/?uris CELEX%3A5 
2020PC0767 

[30] Radio Equipment Directive: http://data.europa.eu/eli/dir/2014/53/oj 

[31] Recovery and Resilience Facility: https://ec.europa.eu/info/business-econo 
my-euro/recovery-coronavirus/recovery-and-resilience-facility_en 

[32] Revised NIS Directive: https://eur-lex.europa.eu/legal-content/EN/TXT/?u 
ri=COM:2020:823:FIN 

[33] SDG Agenda: https://sustainabledevelopment.un.org 

[34] Sustainable Finance Taxonomy Regulation: https://ec.europa.eu/info/law/su 
stainable-finance-taxonomy-regulation-eu-2020-852/amending-and-supp 
lementary-acts/implementing-and-delegated-acts_en 
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